Project #1 - EXTRACT Function

Overview

The first programming assignment is to implement the EXTRACT SQL function. The primary goal of this assignment is to become familiar with the low-level implementation details of the DBMS and to understand how a SQL query gets executed. All the code in this programming assignment must be written in C++ (specifically C++11). If you have not used C++ before, here's a short tutorial on the language. Even if you are familiar with C++, go over this guide for additional information on writing code in Peloton.

This is a single-person project that will be completed individually (i.e., no groups).

  • Release Date: Jan 19, 2017
  • Due Date: Jan 31, 2017 @ 11:59pm

Instructions

Creating Your Own Project Repository

Go to the Peloton Github page and click the Fork button to create a copy of the repository under your account.

You can then clone your private repository on your local machine in the filesystem, say under the ~/git directory. If you have not used git before, here's some information on source code management using git.

Setting Up Your Development Environment

The next step is to setup your development environment for the project on your machine. Peloton currently only be built on 64-bit Linux. We currently only support development environments in Linux and OSX operating systems:

  • If you are using Linux, then follow these instructions.
  • If you are using OSX, then install VirtualBox and then use Vagrant to setup an Ubuntu 14.04 LTS Desktop VM. Detailed instructions are provided here. You can also follow this tutorial on using Vagrant in OSX Yosemite.

Building Peloton

After setting up your development environment, you can now build Peloton by following these installation instructions. We use the Cmake build system in Peloton.

Implementation

In this assignment, you will need to modify two files (src/expression/date_functions.cpp and test/expression/date_functions_test.cpp). The former contains a stub for the EXTRACT function. You will not need to make any changes to any other file in the system.

Your EXTRACT implementation must support all of the operators defined in the DatePartType (see definition in types.h):

  • CENTURY
  • DAY
  • DECADE
  • DOW
  • DOY
  • HOUR
  • MICROSECOND
  • MILLENNIUM
  • MILLISECOND
  • MINUTE
  • MONTH
  • QUARTER
  • SECOND
  • WEEK
  • YEAR

You can refer to the Postgres documentation on what the EXTRACT function should return for each of these parts. Note that you do not have to implement all of the parts that Postgres supports because Peloton currently does not support timezones (e.g., EPOCH).

Your implementation must also handle NULL timestamps. Peloton includes a C++11 date library to make it easier for you to extract the different values that you need. The source code is located in third_party/date.

Testing

  1. Peloton Testing Framework : You can test the EXTRACT function by making use of the date function test case. This test case uses a single date as the input and invokes the EXTRACT function for several different DatePartType values. You will need to make sure that your implementation passes these tests and then expand it to include the remaining DatePartType values.

    cd build
    make check
    valgrind --trace-children=yes \
       --leak-check=full \
       --track-origins=yes \
       --soname-synonyms=somalloc=*jemalloc* \
       --error-exitcode=1 \
       --suppressions=../third_party/valgrind/valgrind.supp \
       ./test/date_functions_test
    

  2. PSQL Command-line Interface: We are also providing a SQL script for testing your implementation and also getting used to the Peloton's command line interface. Follow these instructions to launch Peloton's psql terminal, and then load in this SQL script to run it. You can cross check your output with the expected output.

Development Hints

  1. In any large software system, it is critical to follow some conventions to allow everyone to build on top of each other's work. Here's a short list of rules that you should follow while hacking on Peloton. For instance, use class UpperCaseCamelCase for type names, int lower_case_with_underscores for variable/method/function names. Always use descriptive names and comment generously.

    We use ClangFormat to ensure that everyone follows the same code formatting style. Before submitting your assignment, ensure that you follow these guidelines by running this command:

    clang-format-3.6 --style=file ./src/expression/date_functions.cpp | diff ./src/expression/date_functions.cpp -

    ClangFormat supports two ways to provide custom style options: directly specify the style configuration in the --style= command line option or use --style=file and put the style configuration file in the .clang-format file in the project directory. We follow the second approach. The style file is located here in the Peloton repository. To modify the source code file to adhere to the style file, you can use the following command as describe in the manpage:

    clang-format-3.6 --style=file -i ./src/expression/date_functions.cpp

  2. Instead of using printf statements for debugging, use the LOG_* macros for logging information like this:

    LOG_INFO("Nested loop join executor -- %d children", num_children);
    LOG_DEBUG("Advance the right buffer iterator.");

    To enable logging in peloton, you will need to reconfigure it like this :

    cd build
    cmake -DCMAKE_BUILD_TYPE=DEBUG ..
    make -j 4

    More information is available here. The different logging levels are defined in this header file. After enabling logging, the logging level defaults to LOG_LEVEL_INFO. So, any logging method with a logging level that equal to or higher than LOG_LEVEL_INFO, like LOG_INFO, LOG_WARN, and LOG_ERROR will emit logging information in the peloton debugging log file. The debugging log file is written to stdout.

    Note: Our source validator pre-commit hook will throw an error if you try to push code that includes forbidden commands (e.g., printf, cout, cerr). See these instructions on how to set it up.

Grading Rubric

Each project submission will be graded based on the following criteria:

  1. Does the submission successfully execute all of the test cases and produce the correct answer?
  2. Does the submission execute without any memory leaks according to Valgrind?
  3. Does the submission use the proper formatting for its source code?

Note that we will use additional test cases that are more complex and go beyond the sample test cases that we provide you.

Late Policy

10% will deducted from the final score of the project for every 24 hour period that the assignment is late.

Only in extreme circumstances (e.g., medical emergencies) no-penalty extensions will be granted. The student is required to provide written documentation from the University health center. Please contact the instructor if you have any questions.

Submission

After completing the assignment, you can submit your implementation of date_functions.cpp (only one file) to Autolab:

You can submit your answers as many times as you like and get immediate feedback. Your score will be sent via email to your andrew account within a few minutes after your submission.

Collaboration Policy

  • Every student has to work individually on this assignment.
  • Students are allowed to discuss high-level details about the project with others.
  • Students are not allowed to copy the contents of a white-board after a group meeting with other students.
  • Students are not allowed to copy the solutions from another colleague.

WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.