Showcase - CMU 15-721 :: Advanced Database Systems (Spring 2020)

Common Table Expressions

Students: Preetansh Goyal, Rohan Aggarwal, Gautam Jain
Source Code: https://github.com/cmu-db/terrier/pull/885

This project adds support for Common Table Expressions (CTEs). The advantage of CTEs is that they enable the DBMS to store the output of a complex query in a temporary table that can be directly used by many nodes elsewhere in the physical plan. To tackle the different points of access to the complex query, we extended the query optimizer to choose the most appropriate plan. We also added the support for the creation, population, and destruction of a temporary table structure in the execution engine.

Checkpoints & Recovery

Students: Xuanxuan Ge, Kaige Liu, Tianlei Pan
Source Code: https://github.com/cmu-db/terrier/pull/857

We proposed and implemented a checkpoint recovery mechanism that builds upon the existing log recovery infrastructure. The checkpoint contains information of all the tables, which can be used to directly recover the database to its previous state without replaying the previous logs one by one. The database system finds the most recent checkpoint and applies log records to reach the full state. Our implementation provides 3x faster recovery speed in benchmarks and scales linearly with table size and transaction numbers.

Optimizer Cost Model

Students: Masha Oreshko, Kevin Geng, Vivian Huang
Source Code: https://github.com/cmu-db/terrier/pull/889

This project implements the ANALYZE operator and a new cost model. ANALYZE uses existing analytic functions to calculate useful statistics for tables, which are then stored. The cost model which calculations are based on Postgres's cost model, uses the results from ANALYZE to primarily decide between joins.

Nested Queries

Students: Xinzhu Cai, Guancheng Li, Ian Romines
Source Code: https://github.com/cmu-db/terrier/pull/864

This provides support for nested queries in the DBMS. It implements unnesting algorithms as rewriting rules in the query optimizer and new operators in the query execution engine. Four types of query nesting are supported.

NUMA-Aware Thread Pool

Students: Emmanuel Eppinger, Deepayan Patra, Ricky Zhou
Source Code: https://github.com/cmu-db/terrier/pull/851

This project implements a NUMA-aware, resumable task execution framework for CMU's self-driving database. Our implementation includes a thread pool that allows for tasks to be executed on cores local to the data accessed, and integrates with system's latches to allow for tasks to be paused and later resumed, preventing busy-waiting. Our benchmarks observed performance improvements of up to 600x on worst-case high-contention workloads, and general-case performance improvements of 3-4x.

Schema Changes

Students: Sheng Xu, Ling Zhang, Ricky Xu
Source Code: https://github.com/cmu-db/terrier/pull/862

This project adds the schema change functionality and ALTER TABLE command to the DBMS. We adapted the whole pipeline (parser, binder, optimizer, planner, execution, and storage) to support simple ALTER TABLES commands such as add and drop column. We implemented schema change in a non-blocking manner, allowing for arbitrary number of schema changes that are executed concurrently with sql queries on the same table. We also implemented schema change in a lazy manner, by maintaining seperate physical tables for each schema version, and only migrating tuples to the latest version when they are updated. We wrote benchmarks that show concurrent scheme changes slow down sql queries on the same table by at most 10%.

Multi-threaded Queries

Students: Yinuo Pan, Zecheng He, Yuhong Zhang
Source Code: https://github.com/cmu-db/terrier/pull/860

The project works on providing intra-operator multi-threaded functionalities starting with the sequential scan for CMU's database terrier. The new execution engine can generate tpl codes which invokes multiple threads to concurrently produce scan outputs for the upper-level operators within its pipeline.

Apache Arrow Block Compaction

Students: Vilas Bhat, Arvind Sai Krishnan, Abhijith Anilkumar, Paulina Davison
Source Code: https://github.com/cmu-db/terrier/pull/858

This project adds index updates to the block compaction process and support the ability of the execution engine to operate on compressed data. To do so, it migrates core block compactor functionality from directly modifying data table blocks to accessing data table functions through the execution engine using the DBMS's internal DSL.

Sequences

Students: Tianhan Hu, Zian Ke, Adrian Lo-Yu Chang
Source Code: https://github.com/cmu-db/terrier/pull/897

This project adds the implementation of Postgres-style sequence to the CMU DBMS. It supports nextval and currval, implemented as built-in functions, with session-specific behaviors. Parameters like range and increment values can also be specified on a sequence.

Add/Drop Indexes

Students: Kunal Jobanputra, Alex Stanescu, Cal Lavicka
Source Code: https://github.com/cmu-db/terrier/pull/863

This project implements CREATE INDEX, which entailed adding, populating, and/or dropping the index in a transactionally consistent manner. Transactions that attempt to modify the table as the index is building using a sequential scan are blocked.

Constraints

Students: Yingjing Lu, Wuwen Wang, Yi Zhou
Source Code: https://github.com/cmu-db/terrier/pull/861

This project implements PRIMARY KEY, UNIQUE, and FOREIGN KEY functionality for the new tin-memory DBMS.Out implementation supports constraint definition at table creation time for single column and multi column constraints. Constraints are verified during INSERT, DELETION, and UPDATE operations. There is additional support for CASCADE propagation per constraint definition. We also added future expandability for other constraints, such as EXCLUSION and CHECK.