The first programming assignment is to implement the SQL string functions
CONCAT. The primary goal of this assignment is to become familiar with the low-level implementation details of the DBMS and to understand how a SQL query gets executed. All the code in this programming assignment must be written in C++ (specifically C++11). If you have not used C++ before, here's a short tutorial on the language. Even if you are familiar with C++, go over this guide for additional information on writing code in Peloton.
This is a single-person project that will be completed individually (i.e., no groups).
- Release Date: Jan 17, 2018
- Due Date: Jan 29, 2018 @ 11:59pm
You can refer to the Postgres documentation on what the SQL functions should return for different types of inputs (e.g., NULLs). You can also reference an existing function implementation (e.g.,
SUBSTR) to make sure that you complete the right steps.
In this assignment, you will need to modify the following files:
You will not need to make any changes to any other file in the system.
There are three steps to adding a new SQL function to Peloton:
- Implement the String Functions
- Register the Functions in the Catalog
- Add the Function Proxy to the LLVM Engine
Step #1 - Implement the String Functions
The first step is to implement the new SQL Functions. All of Peloton's built-in SQL string functions are in the
peloton::function::StringFunctions utility class. You need to create a new static functions in this class for each SQL function that you want to support. Each of these static functions will take in an
ExecutorContext object as its first argument. This object stores information about the query being executed and provides a memory pool for allocating memory if it is needed.
The functions will then have additional input arguments and different return types:
LOWER, the input arguments are (1) a
const char*for the string to be modified and (2) a
const uint32_tfor the string length. The return type should be a
char*. You should check for NULL in the operator implementation, not the function.
CONCAT, the input arguments are (1) an array of strings to concatenate together (
const char **concat_strs), (2) an array of the lengths of those strings (
const uint32_t*), and (3) the number of strings in the array (
const uint32_t). The return type must be a
StringFunctions::StrWithLenstruct, since the code that called your method does not know how long the string will be that you return.
HINT: Use the
PL_MEMCPY macro to copy memory between arrays as needed. You should not invoke
Step #2 - Register the Function in the Catalog
You need next to modify the catalog initialization methods so that the system knows that it now supports your new functions. Without this, the binder will not be able to map the function name placeholder in the query plan to the actual implementation. You know that you are missing this part if you get the error message "Failed to find function" when you invoke your method.
Catalog::InitializeFunctions() to register the three new built-in SQL functions that you are adding. You will need to also add the new entries to the
Step #3 - Add the Function Proxy to the LLVM Engine
The last step is to add the proxy in the LLVM engine. This is where the engine converts the query plan into LLVM instructions that invoke your function.
LOWER, you will need to create separate
UnaryOperatorHandleNullstructs in codegen/type/varchar_type.cpp. You will need to implement three methods:
SupportsType(): Returns true if the given input type is a VARCHAR.
ResultType(): Returns the VARCHAR type.
Impl(): Uses the
StringFunctionsProxyobject to invoke the real function in
StringFunctions. You do not need to worry about NULL inputs. The
UnaryOperatorHandleNullbase class will handle that for you. The return type should be a
CONCAT, we are already providing you with the
BinaryOperatorstruct implementation in codegen/type/varchar_type.cpp. This is because the operator implementation for this function is more complicated that the previous two (e.g., you have to allocate memory on the stack). You will need to uncomment the code once you complete steps #1 and #2 above.
Make sure you also add the three new functions to the function look-up tables in codegen/type/varchar_type.cpp. Add new entries for
kUnaryOperatorTable. Add a new entry for
You then need to connect your operator handlers to the catalog. You add entries for the three functions in both codegen/proxy/string_functions_proxy.cpp and include/codegen/proxy/string_functions_proxy.h. Use the
DEFINE_METHOD macro like the other entries in these two files. It will take care everything for you.
Creating Your Own Project Repository
Go to the Peloton Github page and click the Fork button to create a copy of the repository under your account.
You can then clone your private repository on your local machine in the filesystem, say under the
~/git directory. If you have not used
git before, here's some information on source code management using git.
Setting Up Your Development Environment
The next step is to setup your development environment for the project on your machine. We currently only support development environments in Linux and OSX operating systems. We also provide a Vagrant VM images that will automatically configure itself to work with Peloton.
- If you are using Linux, then follow these instructions.
- If you are using OSX, then follow these instructions. You can also follow this tutorial on using Vagrant in OSX Yosemite.
- If you are using Windows, then install VirtualBox and then use Vagrant to setup an Ubuntu 14.04 LTS Desktop VM. Detailed instructions are provided here.
Peloton Testing Framework : You can test the string functions by making use of the string functions test case. We will use a more comprehensive testing script to grade your assignment.
cd build make string_functions_test
PSQL Command-line Interface: We are also providing a SQL script for testing your implementation and also getting used to the Peloton's command line interface. Follow these instructions to launch Peloton's
psqlterminal, and then load in this SQL script to run it. You can cross check your output with the expected output.
psql "sslmode=disable" -U postgres -h localhost -p 15721 < string_functions.sql > string_functions.mine diff string_functions.mine string_functions.out
In any large software system, it is critical to follow some conventions to allow everyone to build on top of each other's work. Here's a short list of rules that you should follow while hacking on Peloton. For instance, use
class UpperCaseCamelCasefor type names/methods/functions,
int lower_case_with_underscoresfor variable names. Always use descriptive names and comment generously.
We use ClangFormat to ensure that everyone follows the same code formatting style. Before submitting your assignment, ensure that you follow these guidelines by running this command:
clang-format-3.6 --style=file ./src/function/string_functions.cpp | diff ./src/function/string_functions.cpp -
ClangFormat supports two ways to provide custom style options: directly specify the style configuration in the
--style=command line option or use
--style=fileand put the style configuration file in the
.clang-formatfile in the project directory. We follow the second approach. The style file is located here in the Peloton repository. To modify the source code file to adhere to the style file, you can use the following command as describe in the manpage:
clang-format-3.6 --style=file -i ./src/function/string_functions.cpp
Instead of using
printfstatements for debugging, use the
LOG_*macros for logging information like this:
LOG_INFO("Nested loop join executor -- %d children", num_children); LOG_DEBUG("Advance the right buffer iterator.");
To enable logging in peloton, you will need to reconfigure it like this :
cd build cmake -DCMAKE_BUILD_TYPE=DEBUG .. make
More information is available here. The different logging levels are defined in this header file. After enabling logging, the logging level defaults to
LOG_LEVEL_INFO. So, any logging method with a logging level that equal to or higher than
LOG_ERRORwill emit logging information in the peloton debugging log file. The debugging log file is written to
To speed up compilation, you can use the
-jcommand-line flag for
maketo tell it how many threads. For example, if you want to use four cores to build the system, you would execute
make -j 4.
Each project submission will be graded based on the following criteria:
- Does the submission successfully execute all of the test cases and produce the correct answer?
- Does the submission execute without any memory leaks according to Valgrind?
- Does the submission use the proper formatting for its source code?
Note that we will use additional test cases that are more complex and go beyond the sample test cases that we provide you.
10% will deducted from the final score of the project for every 24 hour period that the assignment is late.
Only in extreme circumstances (e.g., medical emergencies) no-penalty extensions will be granted. The student is required to provide written documentation from the University health center. Please contact the instructor if you have any questions.
After completing the assignment, you can submit your implementation of these files to Autolab (https://autolab.andrew.cmu.edu/courses/15721-s18):
You can submit your answers as many times as you like and get immediate feedback. Your score will be sent via email to your andrew account within a few minutes after your submission.
- Every student has to work individually on this assignment.
- Students are allowed to discuss high-level details about the project with others.
- Students are not allowed to copy the contents of a white-board after a group meeting with other students.
- Students are not allowed to copy the solutions from another colleague.
WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.