Overview
For this project, you are required to implement a latch-free Skip List in the Peloton database management system. The primary goal is to understand the nuances of concurrency control in a modern, order-preserving indexing data structure. Go over this style guide for information on writing code in Peloton.
Here's a list of useful references on latch-free data structures:
- Techniques for Implementing Concurrent Data Structures on Modern Multicore Machines
- The Bw-Tree: A B-tree for New Hardware Platforms (Slide deck)
- Lock-Free Linked List
- The Story Behind MemSQL's Skiplist Indexes
IMPORTANT: This is an open-ended assignment. There are several design choices to be made while implementing a concurrent data structure. Do not expect us to guide you in making every one of these design choices. In general, try to export an interface in your Skip List that is similar to that of the STX B+tree.
This is a group project. Each group must have exactly three members unless given prior authorization by the instructor.
- Release Date: Jan 31, 2018
- Due Date: Mar 12, 2018 @ 11:59pm
Instructions
Updating Your Project Repository
We assume that you have already setup the project repository and the development environment following the instructions in the first assignment. We have made a lot of changes. You should ensure that the latest changes in the online repository are propagated to your local repository by using these commands:
git remote add upstream git://github.com/cmu-db/peloton.git git fetch upstream git rebase -i upstream/master git push --force
Implementation
In this assignment, you will need to modify only these five files:
- Skip List Data Structure source file
- Skip List Data Structure header file
- Skip List Index source file
- Skip List Index header file
- Index testing suite
IMPORTANT: The "Data Structure" files are the actual skip list implementation. The "Index" files are a wrapper that implements Peloton's index API.
Peloton also contains a Bw-Tree index you can use as a reference to better understand how the different arguments work.
Your Skip List implementation must abstract away the details of the key data type and associated comparator and equality checker, like this:
template <typename KeyType, typename ValueType, typename KeyComparator, typenameKeyEqualityChecker, typename ValueEqualityChecker> class SkipListIndex { };
These classes are already implemented for you:
KeyType
: The type of each key in the index. This will either be a GenericKey, CompactIntsKey, or TupleKeyValueType
: The type of each value in the index. This will usually be a 64-bit ItemPointer.KeyComparator
: The class used to compare whether twoKeyType
instances are less/greater-than each other. These will be included in theKeyType
implementation files.KeyEqualityChecker
: The class used to compare whether twoKeyType
instances are equivalent. These will be included in theKeyType
implementation files.ValueEqualityChecker
: The class used to compare whether twoValueType
instances are equivalent. This is needed to support non-unique indexes.
Requirements
Your index must optionally support duplicate keys with different values (i.e., this should be a configurable parameter).
You must support both forward and reverse iterators that work correctly with concurrent
mutators.
The functionality of your index implementation must match the specification given in Peloton's index API:
- The
InsertEntry
function inserts a key-value pair into the index. If it is configured to not support duplicate keys, then theInsertEntry
function should returnfalse
when we try to insert a duplicate key into the index. On the other hand, if it is configured to support duplicate keys, then it should not returnfalse
when we try to insert a duplicate key into the index. - The
DeleteEntry
function should erase only the index entry matching the specific <key, value> pair. - The
CondInsertEntry
function finds all key-value pairs that match the givenkey
and tests whether any of them satisfy the given predicate. If none of the key-value pairs satisfy the predicate, then we insert the given key-value pair into the index, setpredicate_satisfied
to true, and return true. Otherwise, we setpredicate_satisfied
to false and return false. - The
Scan
function scans a range inside the index and returns all elements in theresult
vector. The type of scan to perform is specified by the given scan predicate,csp_p
, and is one of the following: a full scan, a range scan, or a point query. Whether to perform a forward or backward scan is determined by the givenscan_direction
. - The
ScanLimit
function has the same behavior as theScan
function except that the scan stops after it reads offset + limit elements. Therefore, maximum of limit elements are returned in theresult
vector. - The
ScanAllKeys
function scans all keys in the index and stores the values in theresult
vector. - The
ScanKey
function finds all values in the index with the givenkey
and returns them in theresult
vector. If the index is not configured to store duplicate keys then at most one value will be returned. - The
PerformGC
function garbage collects deleted index entries. - The
NeedGC
function returns true returns true if garbage collection is needed on the index. - The
GetMemoryFootprint
function returns the amount of heap space used by the index.
IMPORTANT: Your skip list implementation should mimic the existing Bw-Tree index. If you are not sure about what the correct output or behavior should be for your index, then execute the Bw-Tree and see what data it returns. Your index implementation should be a drop-in replacement for it.
You should add an utility function to validate the structural integrity of the index data structure. If any of the integrity checks fail, then this function should fix the integrity issue. This functionality is similar to what fsck
provides for a file system.
You should implement a latch-free memory tracker (e.g., epoch-based or RCU). We will discuss different methods next week.
You must periodically garbage collect deleted index entries. The way our API works is that a separate thread will invoke NeedGC() to check whether the index needs to be garbage collected. To test your garbage collection implementation, we will compute the memory footprint after inserting a set of entries into the index. Then, we will delete a subset of those entries. Finally, after invoking the PerformGC
function on your index, we will recompute the memory footprint. If the memory footprint does not drop as expected, then your submission will be penalized.
Your implementation must maintain proper statistics by invoking the stats methods provided by the index API when appropriate.
You should add multiple unit test cases in the index testing suite to test the functionality of your index data structure.
Testing
You can test the concurrent index by making use of the index test suite. The implementation of these different test cases is found in TestingIndexUtil The testing suite contains a set of multi-threaded test cases designed to validate the functionality and evaluate the performance of the index. To specifically run the join test case, follow these testing instructions. To specifically run the index test suite
, follow these testing instructions.
cd build make skiplist_index_test valgrind --trace-children=yes \ --leak-check=full \ --track-origins=yes \ --soname-synonyms=somalloc=*jemalloc* \ --error-exitcode=1 \ --suppressions=../third_party/valgrind/valgrind.supp \ ./test/skiplist_index_test
Development Hints
-
We use ClangFormat to ensure that everyone follows the same code formatting style. Before submitting your assignment, ensure that you follow these guidelines by running this command:
clang-format-3.6 --style=file ./src/index/skiplist_index.cpp | diff ./src/index/skiplist_index.cpp -
ClangFormat supports two ways to provide custom style options: directly specify the style configuration in the
-style=
command line option or use-style=file
and put the style configuration file in the.clang-format
file in the project directory. We follow the second approach. The style file is located here in the Peloton repository. To modify the source code file to adhere to the style file, you can use the following command as describe in the manpage:clang-format-3.6 --style=file -i ./src/index/skiplist_index.cpp
-
Instead of using
printf
statements for debugging, use theLOG_*
macros for logging information like this:LOG_INFO("Nested loop join executor -- %d children", num_children); LOG_DEBUG("Advance the right buffer iterator.");
To enable logging in peloton, you will need to reconfigure it like this :
cd build cmake -DCMAKE_BUILD_TYPE=DEBUG .. make
More information is available here. The different logging levels are defined in this header file. After enabling logging, the logging level defaults to
LOG_LEVEL_INFO
. So, any logging method with a logging level that equal to or higher thanLOG_LEVEL_INFO
, likeLOG_INFO
,LOG_WARN
, andLOG_ERROR
will emit logging information in the peloton debugging log file. The debugging log file is written tostdout
.Note: Our source validator pre-commit hook will throw an error if you try to push code that includes forbidden commands (e.g.,
printf
,cout
,cerr
). See these instructions on how to set it up.
Grading Rubric
Each project submission will be graded based on the following criteria:
- Does the submission successfully execute all of the test cases and produce the correct answer?
- Does the submission execute without any memory leaks according to Valgrind?
- Does the submission implement the garbage collection API correctly?
- Does the submission contain functions that perform unit testing and validate structural integrity?
- Does the submission use the proper formatting and documentation for its source code?
Note that we will use additional test cases that are more complex and go beyond the sample test cases that we provide you.
Bonus points will be given to the top 3 groups with the fastest implementations. The amount of bonus points given to these groups with the best implementations is left up to the discretion of the instructor and contingent on an additional evaluation of the source code.
Late Policy
10% will deducted from the final score of the project for every 24 hour period that the assignment is late.
Only in extreme circumstances (e.g., medical emergencies) no-penalty extensions will be granted. The student is required to provide written documentation from the University health center. Please contact the instructor if you have any questions.
Submission
Once you have completed the assignment, compress the five modified files as a tar file named handin.tar
, and submit it to Autolab:
Let's assume that you have copied the five modified files into a directory named handin
. Compress the handin
directory into a tar file like this :
tar -cvf handin.tar handin
You can now submit handin.tar
file to Autolab. Our grading script will decompress the tar file like this and copy them over into our peloton project repository before running our tests:
tar -xvf handin.tar cd handin
You may submit your answers as many times as you like. You will receive instant feedback on your autograded assignment. Your score will be sent via email to your andrew account within a few minutes after your submission.
Collaboration Policy
- Everyone has to work in a team of three people for this assignment.
- Groups are allowed to discuss high-level details about the project with others.
- Groups are not allowed to copy the contents of a white-board after a group meeting with other students.
- Groups are not allowed to copy the solutions from other colleagues.
WARNING: All of the code for this project must be your own. You may not copy source code from other groups or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.