Syllabus

Educational Objectives

This is a graduate-level course on the internals of database management systems. This course has a heavy emphasis on programming projects. There is also readings assigned for each class and a final exam. Upon successful completion of this course, the student should be able to:

  • Understand the state-of-the-art in implementation and design of single-node database management systems.
  • Interpret and critically analyze research papers on solving problems with data intensive workloads and applications.
  • Apply database concepts to solve high-velocity and high-volume data problems.
  • Be familiar with modern database system coding practices.

All programming projects will be completed in the Peloton database management system.

Grading Scheme

The final grade for the course will be based on the following weights:

Reading Assignments & Reviews

For each class, there is set of assigned readings. Each student is required to turn in a one paragraph synopsis of the mandatory paper (denoted by the symbol on the course schedule). Students are encouraged to peruse the supplemental readings to enhance their knowledge about a particular, but this not required and these papers will not be covered in the final exam. Students are allowed to miss reading review submissions for four classes during the semester. Late submissions will not be accepted without prior approval from the instructor.

Each review must include the following information:

  • An overview of the main idea and contributions (Two sentences).
  • What system was used in the implementation (One sentence).
  • The workloads that they used for their evaluation (One sentence).

Students will submit their synopsis using this Google Form before class begins. Late submissions will not be accepted.

WARNING: These reading reviews must be your own writing. You may not copy from the papers or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.

Programming Project #1 — Hash Join Operator

The first programming assignment is to implement an in-memory hash join. This is a single-person project that will be completed individually (i.e., no groups). Students will be provided with instructions on what files to modify in the DBMS and test cases to evaluate their implementation. Grading will be based on both correctness and performance.

Programming Project #2 — Concurrent Index

Students will organize into groups of three and implement a thread-safe, concurrent balanced tree index. Once again students will be provided with instructions on what files to modify in the DBMS and test cases to evaluate their implementation. Grading will be based on both correctness and performance.

Programming Project #3 — Group Project

The main component of this course will be the final group project. Students will organize into groups and choose to implement a project that is (1) relevant to the materials discussed in class, (2) requires a significant programming effort from all team members, and (3) unique (i.e., two groups may not choose the same project topic). The projects will vary in both scope and topic, but they must satisfy this criteria. We will discuss this more in depth during class, though students are encouraged to begin to think about projects that interest them early on. If a group is unable to come up with their own project idea, the instructor will provide suggestions on interesting topics.

Final Exam

There will be a written exam in the last class at the end of the semester. The exam will be long-from questions based on the mandatory readings and topics discussed in class. It will be closed notes.

It's going to be raw, son.

Extra Credit

The Carnegie Mellon Database Research Group is writing an on-line encyclopedia of database management systems (both commericial and academic). Each student can earn extra credit if they write an article for one DBMS. The article must be high-quality with proper citations and attributions.