Project #2 - Concurrent Index - CMU 15-721 :: Database Systems (Spring 2016)

Overview

For this project, you are required to implement a latch-free Bw-Tree in the Peloton system. The primary goal is to understand the nuances of concurrency control in a modern, order-preserving indexing data structure. Go over this style guide for information on writing code in the Peloton DBMS.

This is a group project. Each group must have exactly three members.

Release Date: Feb 08, 2016
Due Date: Mar 09, 2016 @ 11:59pm

Instructions

Updating Your Project Repository

We assume that you have already setup the project repository and the development environment following the instructions in the first assignment. We have made a lot of changes. You should ensure that the latest changes in the online repository are propagated to your local repository by using these commands :

git remote add upstream git://github.com/cmu-db/peloton.git
git fetch upstream
git reset --hard upstream/master  
git push origin master --force

After updating the local repository, you can build Peloton by following these installation instructions.

Implementation

You must implement the concurrent ordered index using a Bw-Tree. Here's a list of useful references:

In this assignment, you will need to modify only these five files:

The interface for the index is provided here. Your Bw-Tree implementation should implement this interface. Peloton contains a reference implementation that uses STX B+tree with coarse-grained locking.

Your Bw-Tree implementation must abstract away the details of the key data type and associated comparator and equality checker, like this :

template <typename KeyType, typename ValueType, class KeyComparator, class KeyEqualityChecker>
class BWTreeIndex {
};

Here's more information on template classes in C++.

Your index must optionally support duplicate keys with different values (i.e., this should be a configurable parameter). Go over the interface of the STX B+tree to understand how this can be accomplished in a clean way.

If it is configured to not support duplicate keys, then the InsertEntry function should return false when we try to insert a duplicate key into the index. On the other hand, if it configured to support duplicate keys, then it should not return false when we try to insert a duplicate key into the index.
The DeleteEntry function should erase only the index entry matching the specific <key, value> pair.

Your index must support consolidation when the delta chain gets too long. It is up to you to decide what to use as the consolidation threshold.

Your index must support splitting and merging without the use of locks. Be sure to watch out for concurrent threads trying to insert and delete the same separator key in an internal index node.

You should add multiple unit test cases in the index testing suite to test the functionality of your index data structure. For instance, one such unit test case can verify that node splitting works as expected under different scenarios.

You should add an utility function to validate the structural integrity of the index data structure. If any of the integrity checks fail, then this function should fix the integrity issue. This functionality is similar to what fsck provides for a file system.

You must periodically garbage collect deleted index entries. You will need to implement a cooperative garbage collection approach using epochs. We will compute the memory footprint after inserting a set of entries into the index. Then, we will delete a subset of those entries. Finally, after invoking the Cleanup function, we will recompute the memory footprint. If the memory footprint does not drop as expected, then your submission will be penalized.

You must support reverse iterators that work correctly with concurrent mutators.

IMPORTANT: This is an open-ended assignment. There are several design choices to be made while implementing a concurrent data structure. Do not expect us to guide you in making every one of these design choices. In general, try to export an interface in your Bw-Tree that is similar to that of the STX B+tree.

Testing

Peloton Testing Framework : You can test the concurrent index by making use of the index test suite. The testing suite contains a set of multi-threaded test cases designed to validate the functionality and evaluate the performance of the index. Uncomment this line to test the Bw-Tree index. To specifically run the join test case, follow these testing instructions. To specifically run the index test suite, follow these testing instructions.
```
cd build/tests
make check-build
valgrind --leak-check=yes --trace-children=yes --track-origins=yes ./index_test
```

Development Hints

In any large software system, it is critical to follow some conventions to allow everyone to build on top of each other's work. Here's a short list of rules that you should follow while hacking on Peloton. For instance, use class UpperCaseCamelCase for type names, int lower_case_with_underscores for variable/method/function names. Always use descriptive names and comment generously.
We use ClangFormat to ensure that everyone follows the same code formatting style. Before submitting your assignment, ensure that you follow these guidelines by running this command:
```
clang-format-3.6 --style=file ./src/backend/index/bwtree_index.cpp | diff ./src/backend/backend/index/bwtree_index.cpp -
```
ClangFormat supports two ways to provide custom style options: directly specify the style configuration in the -style= command line option or use -style=file and put the style configuration file in the .clang-format file in the project directory. We follow the second approach. The style file is located here in the Peloton repository. To modify the source code file to adhere to the style file, you can use the following command as describe in the manpage:
```
clang-format-3.6 --style=file -i ./src/backend/index/bwtree_index.cpp
```
Instead of using printf statements for debugging, use the LOG_* macros for logging information like this :
```
LOG_INFO("Nested loop join executor -- %d children", num_children);
LOG_DEBUG("Advance the right buffer iterator.");
```
To enable logging in peloton, you will need to reconfigure it like this :
```
../configure CXXFLAGS="-O0 -g -ggdb" --enable-debug
make -j4
```
More information is available here. The different logging levels are defined in this header file. After enabling logging, the logging level defaults to LOG_LEVEL_INFO. So, any logging method with a logging level that equal to or higher than LOG_LEVEL_INFO, like LOG_INFO, LOG_WARN, and LOG_ERROR will emit logging information in the peloton debugging log file. The debugging log file is, by default, stored inside the data directory -- ~/git/peloton/build/data/pg_log/pg.log.

Grading Rubric

Each project submission will be graded based on the following criteria:

Does the submission successfully execute all of the test cases and produce the correct answer?
Does the submission execute without any memory leaks according to Valgrind?
Does the submission periodically performs garbage collection?
Does the submission contain functions that perform unit testing and validate structural integrity ?
Does the submission use the proper formatting and documentation for its source code?

Note that we will use additional test cases that are more complex and go beyond the sample test cases that we provide you.

Bonus points will be given to the top 2 groups with the fastest implementations. The amount of bonus points given to these groups with the best implementations is left up to the discretion of the instructor and contingent on an additional evaluation of the source code.

Late Policy

10% will deducted from the final score of the project for every 24 hour period that the assignment is late.

Only in extreme circumstances (e.g., medical emergencies) no-penalty extensions will be granted. The student is required to provide written documentation from the University health center. Please contact the instructor if you have any questions.

Submission

Once you have completed the assignment, compress the five modified files as a tar file named handin.tar, and submit it to Autolab.

Let's assume that you have copied the two modified files into a directory named handin. Compress the handin directory into a tar file like this :

tar -cvf handin.tar handin

You can now submit handin.tar file to Autolab. Our grading script will decompress the tar file like this and copy them over into our peloton project repository before running our tests:

tar -xvf handin.tar
cd handin

You may submit your answers as many times as you like. You will receive instant feedback on your autograded assignment. Your score will be sent via email to your andrew account within a few minutes after your submission.

Collaboration Policy

Everyone has to work in a team of three people for this assignment.
Groups are allowed to discuss high-level details about the project with others.
Groups are not allowed to copy the contents of a white-board after a group meeting with other students.
Groups are not allowed to copy the solutions from other colleagues.

WARNING: All of the code for this project must be your own. You may not copy source code from other groups or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.