Publications by Ken%20%20Eguro | Publications

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2019

ColumnML: Column Store Machine Learning with On-the-Fly Data Transformation

Kaan Kara, Ken Eguro, Ce Zhang, and Gustavo Alonso

Proceedings of the VLDB 2019, Los Angeles, CA, USA, August 2019

The ability to perform machine learning (ML) tasks in a database management system (DBMS) provides the data analyst with a powerful tool. Unfortunately, integration of ML into a DBMS is challenging for reasons varying from differences in execution model to data layout requirements. In this paper, we assume a column-store main-memory DBMS, optimized for online analytical processing, as our initial system. On this system, we explore the integration of coordinate-descent based methods working natively on columnar format to train generalized linear models. We use a cache-efficient, partitioned stochastic coordinate descent algorithm providing linear throughput scalability with the number of cores while preserving convergence quality, up to 14 cores in our experiments. Existing column oriented DBMS rely on compression and even encryption to store data in memory. When those features are considered, the performance of a CPU based solution suffers. Thus, in the paper we also show how to exploit hardware acceleration as part of a hybrid CPU+FPGA system to provide on-the-fly data transformation combined with an FPGA-based coordinate-descent engine. The resulting system is a column-store DBMS with its important features preserved (e.g., data compression) that offers high performance machine learning capabilities.

@inproceedings{abc,
	abstract = {The ability to perform machine learning (ML) tasks in a database
management system (DBMS) provides the data analyst with a powerful
tool. Unfortunately, integration of ML into a DBMS is challenging
for reasons varying from differences in execution model to
data layout requirements. In this paper, we assume a column-store
main-memory DBMS, optimized for online analytical processing,
as our initial system. On this system, we explore the integration of
coordinate-descent based methods working natively on columnar
format to train generalized linear models. We use a cache-efficient,
partitioned stochastic coordinate descent algorithm providing linear
throughput scalability with the number of cores while preserving
convergence quality, up to 14 cores in our experiments.
Existing column oriented DBMS rely on compression and even
encryption to store data in memory. When those features are considered,
the performance of a CPU based solution suffers. Thus,
in the paper we also show how to exploit hardware acceleration
as part of a hybrid CPU+FPGA system to provide on-the-fly data
transformation combined with an FPGA-based coordinate-descent
engine. The resulting system is a column-store DBMS with its important
features preserved (e.g., data compression) that offers high
performance machine learning capabilities.},
	author = {Kaan Kara and Ken  Eguro and Ce Zhang and Gustavo Alonso},
	booktitle = {Proceedings of the VLDB 2019},
	title = { ColumnML: Column Store Machine Learning with On-the-Fly Data Transformation},
	venue = {Los Angeles, CA, USA},
	year = {2019}
}