Publications by Kaan Kara
2019
Proceedings of the VLDB 2019, Los Angeles, CA, USA, August 2019
The ability to perform machine learning (ML) tasks in a database
management system (DBMS) provides the data analyst with a powerful
tool. Unfortunately, integration of ML into a DBMS is challenging
for reasons varying from differences in execution model to
data layout requirements. In this paper, we assume a column-store
main-memory DBMS, optimized for online analytical processing,
as our initial system. On this system, we explore the integration of
coordinate-descent based methods working natively on columnar
format to train generalized linear models. We use a cache-efficient,
partitioned stochastic coordinate descent algorithm providing linear
throughput scalability with the number of cores while preserving
convergence quality, up to 14 cores in our experiments.
Existing column oriented DBMS rely on compression and even
encryption to store data in memory. When those features are considered,
the performance of a CPU based solution suffers. Thus,
in the paper we also show how to exploit hardware acceleration
as part of a hybrid CPU+FPGA system to provide on-the-fly data
transformation combined with an FPGA-based coordinate-descent
engine. The resulting system is a column-store DBMS with its important
features preserved (e.g., data compression) that offers high
performance machine learning capabilities.
@inproceedings{abc, abstract = {The ability to perform machine learning (ML) tasks in a database management system (DBMS) provides the data analyst with a powerful tool. Unfortunately, integration of ML into a DBMS is challenging for reasons varying from differences in execution model to data layout requirements. In this paper, we assume a column-store main-memory DBMS, optimized for online analytical processing, as our initial system. On this system, we explore the integration of coordinate-descent based methods working natively on columnar format to train generalized linear models. We use a cache-efficient, partitioned stochastic coordinate descent algorithm providing linear throughput scalability with the number of cores while preserving convergence quality, up to 14 cores in our experiments. Existing column oriented DBMS rely on compression and even encryption to store data in memory. When those features are considered, the performance of a CPU based solution suffers. Thus, in the paper we also show how to exploit hardware acceleration as part of a hybrid CPU+FPGA system to provide on-the-fly data transformation combined with an FPGA-based coordinate-descent engine. The resulting system is a column-store DBMS with its important features preserved (e.g., data compression) that offers high performance machine learning capabilities.}, author = {Kaan Kara and Ken Eguro and Ce Zhang and Gustavo Alonso}, booktitle = {Proceedings of the VLDB 2019}, title = { ColumnML: Column Store Machine Learning with On-the-Fly Data Transformation}, venue = {Los Angeles, CA, USA}, year = {2019} }
Proceedings of the VLDB 2019, Los Angeles, CA, USA, August 2019
Learning from the data stored in a database is an important function
increasingly available in relational engines. Methods using
lower precision input data are of special interest given their overall
higher efficiency. However, in databases, these methods have a
hidden cost: the quantization of the real value into a smaller number
is an expensive step. To address this issue, we present MLWeaving,
a data structure and hardware acceleration technique intended
to speed up learning of generalized linear models over low
precision data. MLWeaving provides a compact in-memory representation
that enables the retrieval of data at any level of precision.
MLWeaving also provides a highly efficient implementation
of stochastic gradient descent on FPGAs and enables the dynamic
tuning of precision, instead of using a fixed precision level during
learning. Experimental results show that MLWeaving converges
up to 16 faster than low-precision implementations of first-order
methods on CPUs.
@inproceedings{abc, abstract = {Learning from the data stored in a database is an important function increasingly available in relational engines. Methods using lower precision input data are of special interest given their overall higher efficiency. However, in databases, these methods have a hidden cost: the quantization of the real value into a smaller number is an expensive step. To address this issue, we present MLWeaving, a data structure and hardware acceleration technique intended to speed up learning of generalized linear models over low precision data. MLWeaving provides a compact in-memory representation that enables the retrieval of data at any level of precision. MLWeaving also provides a highly efficient implementation of stochastic gradient descent on FPGAs and enables the dynamic tuning of precision, instead of using a fixed precision level during learning. Experimental results show that MLWeaving converges up to 16 faster than low-precision implementations of first-order methods on CPUs. }, author = {Zeke Wang and Kaan Kara and and Gustavo Alonso and Onur Mutlu and Ce Zhang}, booktitle = {Proceedings of the VLDB 2019}, title = {Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning }, url = {https://dl.acm.org/doi/10.14778/3317315.3317322}, venue = {Los Angeles, CA, USA}, year = {2019} }