Publication

Systems Group Master's Thesis, no. ETH Zürich; Department of Computer Science, April 2009
Supervised by: Prof. Gustavo Alonso
DatabaseManagement Systems (DBMS) rely on highly optimized engines. However the optimization process was driven by the context of self-contained systems. In the current trend of mulitcore CPUs and internet scale (highly parallelized) applications their monolithic design is becoming more and more of a scalability issue. While current approaches try to achieve performance by replicating or partitioning data in different ways, to minimize concurrent access, we take a different approach by modularizing an existing RDBMS (Apache Derby) into functional units, that are going to be distributed across several physical machines. In this thesis it has been shown that it is possible to modularize and distribute the phases of SQL processing of an existing RDBMS (Apache Derby) without changing its functional behavior. We describe the required analysis, refactorings and methodology used. Additionally we identify and describe several key software engineering principles necessary for such refactoring. This thesis focuses on restructuring an existing RDBMS engine to run in a distributed manner. We also suggest concrete paths to achieve improved scalability and further modularization.
@mastersthesis{abc,
	abstract = {DatabaseManagement Systems (DBMS) rely on highly optimized engines. However
the optimization process was driven by the context of self-contained systems.
In the current trend of mulitcore CPUs and internet scale (highly parallelized)
applications their monolithic design is becoming more and more of a
scalability issue. While current approaches try to achieve performance by replicating
or partitioning data in different ways, to minimize concurrent access, we
take a different approach by modularizing an existing RDBMS (Apache Derby)
into functional units, that are going to be distributed across several physical
machines. In this thesis it has been shown that it is possible to modularize and
distribute the phases of SQL processing of an existing RDBMS (Apache Derby)
without changing its functional behavior. We describe the required analysis,
refactorings and methodology used. Additionally we identify and describe several
key software engineering principles necessary for such refactoring. This thesis
focuses on restructuring an existing RDBMS engine to run in a distributed
manner. We also suggest concrete paths to achieve improved scalability and
further modularization.},
	author = {Pavel Kowalski},
	school = {ETH Z{\"u}rich},
	title = {Modular Derby},
	year = {2009}
}