Publications by Ce Zhang
2019
Proceedings of the SysML'19, Stanford, CA, USA, April 2019
Continuous integration is an indispensable step of modern software engineering practices to systematically
manage the life cycles of system development. Developing a machine learning model is no difference— it is an
engineering process with a life cycle, including design, implementation, tuning, testing, and deployment. However,
most, if not all, existing continuous integration engines do not support machine learning as first-class citizens.
In this paper, we present ease.ml/ci, to our best knowledge, the first continuous integration system for machine
learning. The challenge of building ease.ml/ci is to provide rigorous guarantees, e.g., single accuracy point
error tolerance with 0.999 reliability, with a practical amount of labeling effort, e.g., 2K labels per test. We design
a domain specific language that allows users to specify integration conditions with reliability constraints, and
develop simple novel optimizations that can lower the number of labels required by up to two orders of magnitude
for test conditions popularly used in real production systems.
@inproceedings{abc, abstract = {Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development. Developing a machine learning model is no difference{\textemdash} it is an engineering process with a life cycle, including design, implementation, tuning, testing, and deployment. However, most, if not all, existing continuous integration engines do not support machine learning as first-class citizens. In this paper, we present ease.ml/ci, to our best knowledge, the first continuous integration system for machine learning. The challenge of building ease.ml/ci is to provide rigorous guarantees, e.g., single accuracy point error tolerance with 0.999 reliability, with a practical amount of labeling effort, e.g., 2K labels per test. We design a domain specific language that allows users to specify integration conditions with reliability constraints, and develop simple novel optimizations that can lower the number of labels required by up to two orders of magnitude for test conditions popularly used in real production systems.}, author = {Cedric Renggli and Bojan Karlas and Bolin Ding and Feng Liu and Kevin Schawinski and Wentao Wu and Ce Zhang}, booktitle = {Proceedings of the SysML{\textquoteright}19}, title = {Continuous Integration of Machine Learning Models: A Rigorous Yet Practical Treatment}, venue = {Stanford, CA, USA}, year = {2019} }