Project Showcase

[PRESENTATION] Logging, Checkpoint, and Recovery

Logging, Checkpoints, and Recovery

Students: Eric Haibin Lin, Matt Perron, Abhishek Joshi
Repository: https://github.com/EccentricLoggers/peloton

We implemented correct single-threaded logging and recovery. We also implemented single-threaded checkpoint and recovery. We then extended the logging module to multi-threaded logging and multi-threaded recovery with a back-pressure mechanism.

User-defined Functions

Students: Seunghyun Lee, Yang Zhang, Zheyuan Bu
Repository: https://github.com/shlee0605/peloton

We ported Postgres’s user defined function (UDF into Peloton. Both pl/pgSQL and C UDFs are supported. In order to implement the UDF feature, we modified Postgres’s function manager and server programming interface (SPI), and extended Peloton’s expression system. We also provided some C functions and SQL scripts for testing.

Garbage Collection

Students: Rajat Kateja, Saurabh Kadekodi, Tianyuan Ding
Repository: https://github.com/saurabhkadekodi/peloton-gc

For our project, we implemented garbage collection at the tuple level. When a tuple in a table is deleted or updated, our garbage collection schemes ensure that the memory occupied by that tuple can be recycled and reused for storing new tuples. Since MVCC is used, our schemes ensure that the recycling in done only after no running transaction can access the older (or deleted) tuple data. We implemented three garbage collection scheme. In the first scheme a vacuum thread is responsible for cleaning the garbage on each invocation. In the second scheme, each thread cleans some garbage after it is done with it's assigned task. In the third scheme, garbage is maintained on a per epoch basis on each thread cleans some number of epochs. Our preliminary evaluations show that the second scheme of cooperative garbage collection provides the maximum benefit for an update heavy YCSB workload.

Query Compilation

Students: Prashanth Menon
Repository: https://github.com/pmenon/peloton

The project investigates how to do query compilation in the context of a hybrid main-memory database. We build on top of techniques developed in the HyPer system to implement a majority of common SQL operators. The two primary goals are to offer dramatically improved performance over the existing Volcano-style iterator model, and to enable layout-agnostic operators even when generating code.

Multi-threaded Query Execution

Students: Lu Zhang, Wendong Li, Rui Wang
Repository: https://github.com/TailofJune/peloton

This project is to enable multi-threaded queries in Peloton by parallelizing existing executors. We implemented exchange_scan_executor, exchange_hash_executor and exchange_hash_join_executor. We plugged these executors into executor tree, so single-threaded executors and exchange executors can be switched easily. The exchange_scan_operator scales almost linearly, while other two scale less.

Statistics Collection

Students: Lin Ma, Dana Van Aken, Aaron Harlap
Repository: https://github.com/malin1993ml/peloton

In this project, we designed and implemented the statistics collection framework in the Peloton DBMS. We devise the interfaces of several different stats metrics, and use a hierarchical per-thread stats context to store the statistics of every table, index, and database. A dedicated thread with a StatsAggregator aggregates the stats of all threads periodically and then outputs to a log file or command line.

Constraints

Students: Shimin Wang, Ruirui (Mavis) Xiang, Lei Qi
Repository: https://github.com/yudun/peloton

We implemented constraints in Peloton. We support constraints declaration in both CREATE TABLE (static) and ALTER TABLE (dynamical) statements. For CREATE TABLE statements a user can define PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE and CHECK constraints while for ALTER TABLE we support SET/DROP NOT NULL and SET/DROP UNIQUE. More information can be found in the Peloton wiki.

Query Planner

Students: Alex Poms, Ravi Teja Mullapudi, Ziqi Wang
Repository: https://github.com/cmu-db/peloton

In the Query Planner project, we developed a cost-guided dynamic-programming based framework for relational query optimization. The optimizer framework, equipped with easily configurable transformation rules on customizable plan operators (e.g join, project), explores the space of plans that are logically equivalent to some input query to produce a cost-optimal physical execution plan as output. This search process is made efficient by employing a memoization table (as in the Cascades frameworks) that captures the redundant exploration, optimization, and rule application inherent in our recursive search process. Our optimizer implementation is able to execute queries containing multiple joins and filters end-to-end within the Peloton DBMS.

Networking

Students: Siddharth Santurkar, Nitin Chandra Badam, Di Xiao
Repository: https://github.com/sid1607/peloton-1

The project consisted of two independent components:

The Peloton Wire Protocol: This is a prototype implementation of the Postgres wire protocol in C++ in order to support communication with Postgres shell and JDBC clients with Peloton. To test the correctness of the implementation, this protocol is backed against Sqlite. This helps us provide the required database support for fully executing the YCSB and TPCC benchmarks and inspecting whether this protocol design can potentially support these benchmarks when integrated with Peloton's frontend. The protocol abstracts the server-client interaction for every query through packet formats. The full details about the protocol can be found in the Postgres Documentation.
Integrated Memcache: Memcached is an in-memory key-value cache that evicts records to the disk when the data doesn't fit into memory. Memcached ships a simple CRUD-like API and multiple applications have adopted it for their high-performance data storage needs. However, with an in-memory database like peloton, integrating the Memcached software with it would be counter-productive as it intends to creates an illusion of unbounded memory over disk store and would evict records from memory, even though the main storage database is already in memory. Hence, the goal of this project was to develop a memcached API layer for peloton that translates memcached queries into prepared statements that are executed on Peloton to fetch the desired results. This was evaluated using the YCSB-Memcached-Benchmark over Peloton for correctness and performance.