Overview
For this project, you are required to implement a latch-free Bw-Tree in the Peloton system. The primary goal is to understand the nuances of concurrency control in a modern, order-preserving indexing data structure. Go over this style guide for information on writing code in the Peloton DBMS.
This is a group project. Each group must have exactly three members.
- Release Date: Feb 08, 2016
- Due Date: Mar 09, 2016 @ 11:59pm
Instructions
Updating Your Project Repository
We assume that you have already setup the project repository and the development environment following the instructions in the first assignment. We have made a lot of changes. You should ensure that the latest changes in the online repository are propagated to your local repository by using these commands :
git remote add upstream git://github.com/cmu-db/peloton.git git fetch upstream git reset --hard upstream/master git push origin master --force
After updating the local repository, you can build Peloton by following these installation instructions.
Implementation
You must implement the concurrent ordered index using a Bw-Tree. Here's a list of useful references:
- The Bw-Tree: A B-tree for New Hardware Platforms, Justin Levandoski et al., ICDE 2013 (Slide deck)
- Techniques for Implementing Concurrent Data Structures on Modern Multicore Machines, Stephen Tu
- Lock-Free Linked List
In this assignment, you will need to modify only these five files:
- Bw-Tree-based Index source file
- Bw-Tree-based Index header file
- Bw-Tree Data Structure source file
- Bw-Tree Data Structure header file
- Index testing suite
The interface for the index is provided here. Your Bw-Tree implementation should implement this interface. Peloton contains a reference implementation that uses STX B+tree with coarse-grained locking.
Your Bw-Tree implementation must abstract away the details of the key data type and associated comparator and equality checker, like this :
template <typename KeyType, typename ValueType, class KeyComparator, class KeyEqualityChecker> class BWTreeIndex { };Here's more information on template classes in C++.
Your index must optionally support duplicate keys with different values (i.e., this should be a configurable parameter). Go over the interface of the STX B+tree to understand how this can be accomplished in a clean way.
- If it is configured to not support duplicate keys, then the
InsertEntry
function should returnfalse
when we try to insert a duplicate key into the index. On the other hand, if it configured to support duplicate keys, then it should not returnfalse
when we try to insert a duplicate key into the index. - The
DeleteEntry
function should erase only the index entry matching the specific <key, value> pair.
Your index must support consolidation when the delta chain gets too long. It is up to you to decide what to use as the consolidation threshold.
Your index must support splitting and merging without the use of locks. Be sure to watch out for concurrent threads trying to insert and delete the same separator key in an internal index node.
You should add multiple unit test cases in the index testing suite to test the functionality of your index data structure. For instance, one such unit test case can verify that node splitting works as expected under different scenarios.
You should add an utility function to validate the structural integrity of the index data structure. If any of the integrity checks fail, then this function should fix the integrity issue. This functionality is similar to what fsck
provides for a file system.
You must periodically garbage collect deleted index entries. You will need to implement a cooperative garbage collection approach using epochs. We will compute the memory footprint after inserting a set of entries into the index. Then, we will delete a subset of those entries. Finally, after invoking the Cleanup
function, we will recompute the memory footprint. If the memory footprint does not drop as expected, then your submission will be penalized.
You must support reverse iterators that work correctly with concurrent
mutators.
IMPORTANT: This is an open-ended assignment. There are several design choices to be made while implementing a concurrent data structure. Do not expect us to guide you in making every one of these design choices. In general, try to export an interface in your Bw-Tree that is similar to that of the STX B+tree.
Testing
- Peloton Testing Framework : You can test the concurrent index by making use of the index test suite. The testing suite contains a set of multi-threaded test cases designed to validate the functionality and evaluate the performance of the index. Uncomment this line to test the Bw-Tree index. To specifically run the join test case, follow these testing instructions. To specifically run the
index test suite
, follow these testing instructions.cd build/tests make check-build valgrind --leak-check=yes --trace-children=yes --track-origins=yes ./index_test
Development Hints
-
In any large software system, it is critical to follow some conventions to allow everyone to build on top of each other's work. Here's a short list of rules that you should follow while hacking on Peloton. For instance, use
class UpperCaseCamelCase
for type names,int lower_case_with_underscores
for variable/method/function names. Always use descriptive names and comment generously.We use ClangFormat to ensure that everyone follows the same code formatting style. Before submitting your assignment, ensure that you follow these guidelines by running this command:
clang-format-3.6 --style=file ./src/backend/index/bwtree_index.cpp | diff ./src/backend/backend/index/bwtree_index.cpp -
ClangFormat supports two ways to provide custom style options: directly specify the style configuration in the
-style=
command line option or use-style=file
and put the style configuration file in the.clang-format
file in the project directory. We follow the second approach. The style file is located here in the Peloton repository. To modify the source code file to adhere to the style file, you can use the following command as describe in the manpage:clang-format-3.6 --style=file -i ./src/backend/index/bwtree_index.cpp
-
Instead of using
printf
statements for debugging, use theLOG_*
macros for logging information like this :LOG_INFO("Nested loop join executor -- %d children", num_children); LOG_DEBUG("Advance the right buffer iterator.");
To enable logging in peloton, you will need to reconfigure it like this :
../configure CXXFLAGS="-O0 -g -ggdb" --enable-debug make -j4
More information is available here. The different logging levels are defined in this header file. After enabling logging, the logging level defaults to
LOG_LEVEL_INFO
. So, any logging method with a logging level that equal to or higher thanLOG_LEVEL_INFO
, likeLOG_INFO
,LOG_WARN
, andLOG_ERROR
will emit logging information in the peloton debugging log file. The debugging log file is, by default, stored inside the data directory --~/git/peloton/build/data/pg_log/pg.log
.
Grading Rubric
Each project submission will be graded based on the following criteria:
- Does the submission successfully execute all of the test cases and produce the correct answer?
- Does the submission execute without any memory leaks according to Valgrind?
- Does the submission periodically performs garbage collection?
- Does the submission contain functions that perform unit testing and validate structural integrity ?
- Does the submission use the proper formatting and documentation for its source code?
Note that we will use additional test cases that are more complex and go beyond the sample test cases that we provide you.
Bonus points will be given to the top 2 groups with the fastest implementations. The amount of bonus points given to these groups with the best implementations is left up to the discretion of the instructor and contingent on an additional evaluation of the source code.
Late Policy
10% will deducted from the final score of the project for every 24 hour period that the assignment is late.
Only in extreme circumstances (e.g., medical emergencies) no-penalty extensions will be granted. The student is required to provide written documentation from the University health center. Please contact the instructor if you have any questions.
Submission
Once you have completed the assignment, compress the five modified files as a tar file named handin.tar
, and submit it to Autolab.
Let's assume that you have copied the two modified files into a directory named handin
. Compress the handin
directory into a tar file like this :
tar -cvf handin.tar handin
You can now submit handin.tar
file to Autolab. Our grading script will decompress the tar file like this and copy them over into our peloton project repository before running our tests:
tar -xvf handin.tar cd handin
You may submit your answers as many times as you like. You will receive instant feedback on your autograded assignment. Your score will be sent via email to your andrew account within a few minutes after your submission.
Collaboration Policy
- Everyone has to work in a team of three people for this assignment.
- Groups are allowed to discuss high-level details about the project with others.
- Groups are not allowed to copy the contents of a white-board after a group meeting with other students.
- Groups are not allowed to copy the solutions from other colleagues.
WARNING: All of the code for this project must be your own. You may not copy source code from other groups or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.