Publications by Christopher Ré | Publications

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2017

Incremental knowledge base construction using DeepDive.

Christopher De Sa, Alexander Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, and Ce Zhang

VLDB J., January 2017

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. Recent names used for this problem include dealing with dark data and knowledge base construction (KBC). In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. We observe that the KBC process is iterative, and we develop techniques to incrementally produce inference results for KBC systems. We propose two methods for incremental inference, based respectively on sampling and variational techniques. We also study the tradeoff space of these methods and develop a simple rule-based optimizer. DeepDive includes all of these contributions, and we evaluate DeepDive on five KBC systems, showing that it can speed up KBC inference tasks by up to two orders of magnitude with negligible impact on quality.

@article{abc,
	abstract = {Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. Recent names used for this problem include dealing with dark data and knowledge base construction (KBC). In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. We observe that the KBC process is iterative, and we develop techniques to incrementally produce inference results for KBC systems. We propose two methods for incremental inference, based respectively on sampling and variational techniques. We also study the tradeoff space of these methods and develop a simple rule-based optimizer. DeepDive includes all of these contributions, and we evaluate DeepDive on five KBC systems, showing that it can speed up KBC inference tasks by up to two orders of magnitude with negligible impact on quality.},
	author = {Christopher De Sa and Alexander Ratner and Christopher R{\'e} and Jaeho Shin and Feiran Wang and Sen Wu and Ce Zhang},
	journal = {VLDB J.},
	title = {Incremental knowledge base construction using DeepDive.},
	url = {http://dx.doi.org/10.1007/s00778-016-0437-2},
	year = {2017}
}

DeepDive: declarative knowledge base construction.

Ce Zhang, Christopher Ré, Michael J. Cafarella, Jaeho Shin, Feiran Wang, and Sen Wu

Commun. ACM, -, January 2017

The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.

@inproceedings{abc,
	abstract = {The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.},
	author = {Ce Zhang and Christopher R{\'e} and Michael J. Cafarella and Jaeho Shin and Feiran Wang and Sen Wu},
	booktitle = {Commun. ACM},
	title = {DeepDive: declarative knowledge base construction.},
	url = {http://doi.acm.org/10.1145/3060586},
	venue = {-},
	year = {2017}
}

2016

Asynchrony begets momentum, with an application to deep learning.

Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, and Christopher Ré

54th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2016, Monticello, IL, USA, September 2016

@inproceedings{abc,
	author = {Ioannis Mitliagkas and Ce Zhang and Stefan Hadjis and Christopher R{\'e}},
	booktitle = {54th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2016, Monticello, IL, USA},
	title = {Asynchrony begets momentum, with an application to deep learning.},
	url = {http://dx.doi.org/10.1109/ALLERTON.2016.7852343},
	year = {2016}
}

Extracting Databases from Dark Data with DeepDive.

Ce Zhang, Jaeho Shin, Christopher Ré, Michael J. Cafarella, and Feng Niu

Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 2016

DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data - scientific papers, Web classified ads, customer service notes, and so on - were instead in a relational database, it would give analysts a massive and valuable new set of "big data." DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. To date we have successfully deployed DeepDive to create data-centric applications for insurance, materials science, genomics, paleontologists, law enforcement, and others. The data unlocked by DeepDive represents a massive opportunity for industry, government, and scientific researchers. DeepDive is enabled by an unusual design that combines large-scale probabilistic inference with a novel developer interaction cycle. This design is enabled by several core innovations around probabilistic training and inference.

@inproceedings{abc,
	abstract = {DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data - scientific papers, Web classified ads, customer service notes, and so on - were instead in a relational database, it would give analysts a massive and valuable new set of "big data." DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. To date we have successfully deployed DeepDive to create data-centric applications for insurance, materials science, genomics, paleontologists, law enforcement, and others. The data unlocked by DeepDive represents a massive opportunity for industry, government, and scientific researchers. DeepDive is enabled by an unusual design that combines large-scale probabilistic inference with a novel developer interaction cycle. This design is enabled by several core innovations around probabilistic training and inference.},
	author = {Ce Zhang and Jaeho Shin and Christopher R{\'e} and Michael J. Cafarella and Feng Niu},
	booktitle = {Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016},
	title = {Extracting Databases from Dark Data with DeepDive.},
	url = {http://doi.acm.org/10.1145/2882903.2904442},
	venue = {San Francisco, CA, USA},
	year = {2016}
}

Asynchrony begets Momentum, with an Application to Deep Learning

Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, and Christopher Ré

ETH Zürich, January 2016

@inproceedings{abc,
	author = {Ioannis Mitliagkas and Ce Zhang and Stefan Hadjis and Christopher R{\'e}},
	booktitle = {ETH Z{\"u}rich},
	title = {Asynchrony begets Momentum, with an Application to Deep Learning},
	year = {2016}
}

Large-scale extraction of gene interactions from full-text literature using DeepDive.

Emily K. Mallory, Ce Zhang, Christopher Ré, and Russ B. Altman

Bioinformatics, January 2016

@article{abc,
	author = {Emily K. Mallory and Ce Zhang and Christopher R{\'e} and Russ B. Altman},
	journal = {Bioinformatics},
	title = {Large-scale extraction of gene interactions from full-text literature using DeepDive.},
	url = {http://dx.doi.org/10.1093/bioinformatics/btv476},
	year = {2016}
}

The Beckman report on database research.

Commun. ACM, January 2016

@inproceedings{abc,
	author = {Daniel J. Abadi and Rakesh Agrawal and Anastasia Ailamaki and Magdalena Balazinska and Philip A. Bernstein and Michael J. Carey and Surajit Chaudhuri and Jeffrey Dean and AnHai Doan and Michael J. Franklin and Johannes Gehrke and Laura M. Haas and Alon Y. Halevy and Joseph M. Hellerstein and Yannis E. Ioannidis and H. V. Jagadish and Donald Kossmann and Samuel Madden and Sharad Mehrotra and Tova Milo and Jeffrey F. Naughton and Raghu Ramakrishnan and Volker Markl and Christopher Olston and Beng Chin Ooi and Christopher R{\'e} and Dan Suciu and Michael Stonebraker and Todd Walter and Jennifer Widom},
	booktitle = {Commun. ACM},
	title = {The Beckman report on database research.},
	url = {http://doi.acm.org/10.1145/2845915},
	year = {2016}
}

Materialization Optimizations for Feature Selection Workloads.

Ce Zhang, Arun Kumar, and Christopher Ré

ACM Trans. Database Syst., January 2016

@article{abc,
	author = {Ce Zhang and Arun Kumar and Christopher R{\'e}},
	journal = {ACM Trans. Database Syst.},
	title = {Materialization Optimizations for Feature Selection Workloads.},
	url = {http://doi.acm.org/10.1145/2877204},
	year = {2016}
}

Asynchrony begets Momentum, with an Application to Deep Learning.

Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, and Christopher Ré

CoRR, January 2016

@article{abc,
	author = {Ioannis Mitliagkas and Ce Zhang and Stefan Hadjis and Christopher R{\'e}},
	journal = {CoRR},
	title = {Asynchrony begets Momentum, with an Application to Deep Learning.},
	url = {http://arxiv.org/abs/1605.09774},
	year = {2016}
}

CYCLADES: Conflict-free Asynchronous Machine Learning.

Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris S. Papailiopoulos, Ce Zhang, Michael I. Jordan, Kannan Ramchandran, Christopher Ré, and Benjamin Recht

CoRR, January 2016

@article{abc,
	author = {Xinghao Pan and Maximilian Lam and Stephen Tu and Dimitris S. Papailiopoulos and Ce Zhang and Michael I. Jordan and Kannan Ramchandran and Christopher R{\'e} and Benjamin Recht},
	journal = {CoRR},
	title = {CYCLADES: Conflict-free Asynchronous Machine Learning.},
	url = {http://arxiv.org/abs/1605.09721},
	year = {2016}
}

DeepDive: Declarative Knowledge Base Construction.

Christopher De Sa, Alexander Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, and Ce Zhang

SIGMOD Record, January 2016

@article{abc,
	abstract = {The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.},
	author = {Christopher De Sa and Alexander Ratner and Christopher R{\'e} and Jaeho Shin and Feiran Wang and Sen Wu and Ce Zhang},
	journal = {SIGMOD Record},
	title = {DeepDive: Declarative Knowledge Base Construction.},
	url = {http://doi.acm.org/10.1145/2949741.2949756},
	year = {2016}
}

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs.

Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, and Christopher Ré

CoRR, January 2016

@article{abc,
	author = {Stefan Hadjis and Ce Zhang and Ioannis Mitliagkas and Christopher R{\'e}},
	journal = {CoRR},
	title = {Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs.},
	url = {http://arxiv.org/abs/1606.04487},
	year = {2016}
}

2015

Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms.

Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré

Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, Quebec, Canada, December 2015

@inproceedings{abc,
	author = {Christopher De Sa and Ce Zhang and Kunle Olukotun and Christopher R{\'e}},
	booktitle = {Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015},
	title = {Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms.},
	url = {http://papers.nips.cc/paper/5717-taming-the-wild-a-unified-analysis-of-hogwild-style-algorithms},
	venue = {Montreal, Quebec, Canada},
	year = {2015}
}

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.

Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré

Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, Quebec, Canada, December 2015

@inproceedings{abc,
	author = {Christopher De Sa and Ce Zhang and Kunle Olukotun and Christopher R{\'e}},
	booktitle = {Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015},
	title = {Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.},
	url = {http://papers.nips.cc/paper/5757-rapidly-mixing-gibbs-sampling-for-a-class-of-factor-graphs-using-hierarchy-width},
	venue = {Montreal, Quebec, Canada},
	year = {2015}
}

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning.

Stefan Hadjis, Firas Abuzaid, Ce Zhang, and Christopher Ré

Proceedings of the Fourth Workshop on Data analytics in the Cloud, DanaC 2015, Melbourne, VIC, Australia, May 2015

@inproceedings{abc,
	author = {Stefan Hadjis and Firas Abuzaid and Ce Zhang and Christopher R{\'e}},
	booktitle = {Proceedings of the Fourth Workshop on Data analytics in the Cloud, DanaC 2015, Melbourne, VIC, Australia},
	title = {Caffe con Troll: Shallow Ideas to Speed Up Deep Learning.},
	url = {http://doi.acm.org/10.1145/2799562.2799641},
	year = {2015}
}

Incremental Knowledge Base Construction Using DeepDive.

Sen Wu, Ce Zhang, Feiran Wang, and Christopher Ré

CoRR, January 2015

@article{abc,
	author = {Sen Wu and Ce Zhang and Feiran Wang and Christopher R{\'e}},
	journal = {CoRR},
	title = {Incremental Knowledge Base Construction Using DeepDive.},
	url = {http://arxiv.org/abs/1502.00731},
	year = {2015}
}

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning.

Firas Abuzaid, Stefan Hadjis, Ce Zhang, and Christopher Ré

CoRR, January 2015

@article{abc,
	author = {Firas Abuzaid and Stefan Hadjis and Ce Zhang and Christopher R{\'e}},
	journal = {CoRR},
	title = {Caffe con Troll: Shallow Ideas to Speed Up Deep Learning.},
	url = {http://arxiv.org/abs/1504.04343},
	year = {2015}
}

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms.

Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré

CoRR, January 2015

@article{abc,
	author = {Christopher De Sa and Ce Zhang and Kunle Olukotun and Christopher R{\'e}},
	journal = {CoRR},
	title = {Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms.},
	url = {http://arxiv.org/abs/1506.06438},
	year = {2015}
}

Incremental Knowledge Base Construction Using DeepDive.

Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Ré

PVLDB, January 2015

@inproceedings{abc,
	author = {Jaeho Shin and Sen Wu and Feiran Wang and Christopher De Sa and Ce Zhang and Christopher R{\'e}},
	booktitle = {PVLDB},
	title = {Incremental Knowledge Base Construction Using DeepDive.},
	url = {http://www.vldb.org/pvldb/vol8/p1310-shin.pdf},
	year = {2015}
}

Building a Large-scale Multimodal Knowledge Base for Visual Question Answering.

Yuke Zhu, Ce Zhang, Christopher Ré, and Li Fei-Fei

CoRR, January 2015

@article{abc,
	author = {Yuke Zhu and Ce Zhang and Christopher R{\'e} and Li Fei-Fei},
	journal = {CoRR},
	title = {Building a Large-scale Multimodal Knowledge Base for Visual Question Answering.},
	url = {http://arxiv.org/abs/1507.05670},
	year = {2015}
}

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.

Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré

CoRR, January 2015

@article{abc,
	author = {Christopher De Sa and Ce Zhang and Kunle Olukotun and Christopher R{\'e}},
	journal = {CoRR},
	title = {Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.},
	url = {http://arxiv.org/abs/1510.00756},
	year = {2015}
}

2014

Parallel Feature Selection Inspired by Group Testing.

Yingbo Zhou, Utkarsh Porwal, Ce Zhang, Hung Q. Ngo, Long Nguyen, Christopher Ré, and Venu Govindaraju

Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, December 2014

@inproceedings{abc,
	author = {Yingbo Zhou and Utkarsh Porwal and Ce Zhang and Hung Q. Ngo and Long Nguyen and Christopher R{\'e} and Venu Govindaraju},
	booktitle = {Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014},
	title = {Parallel Feature Selection Inspired by Group Testing.},
	url = {http://papers.nips.cc/paper/5296-parallel-feature-selection-inspired-by-group-testing},
	venue = {Montreal, Quebec, Canada},
	year = {2014}
}

Tradeoffs in Main-Memory Statistical Analytics from Impala to DimmWitted.

Victor Bittorf, Marcel Kornacker, Christopher Ré, and Ce Zhang

Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics, IMDM 2014, Hangzhou, China, September 2014

@inproceedings{abc,
	author = {Victor Bittorf and Marcel Kornacker and Christopher R{\'e} and Ce Zhang},
	booktitle = {Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics, IMDM 2014, Hangzhou, China},
	title = {Tradeoffs in Main-Memory Statistical Analytics from Impala to DimmWitted.},
	url = {http://www-db.in.tum.de/hosted/imdm2014/papers/bittorf.pdf},
	year = {2014}
}

Materialization optimizations for feature selection workloads.

Ce Zhang, Arun Kumar, and Christopher Ré

International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 2014

@inproceedings{abc,
	author = {Ce Zhang and Arun Kumar and Christopher R{\'e}},
	booktitle = {International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA},
	title = {Materialization optimizations for feature selection workloads.},
	url = {http://doi.acm.org/10.1145/2588555.2593678},
	year = {2014}
}

Feature Engineering for Knowledge Base Construction.

Christopher Ré, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, and Ce Zhang

IEEE Data Eng. Bull., January 2014

@inproceedings{abc,
	author = {Christopher R{\'e} and Amir Abbas Sadeghian and Zifei Shan and Jaeho Shin and Feiran Wang and Sen Wu and Ce Zhang},
	booktitle = {IEEE Data Eng. Bull.},
	title = {Feature Engineering for Knowledge Base Construction.},
	url = {http://sites.computer.org/debull/A14sept/p26.pdf},
	year = {2014}
}

DimmWitted: A Study of Main-Memory Statistical Analytics.

Ce Zhang, and Christopher Ré

CoRR, January 2014

@article{abc,
	author = {Ce Zhang and Christopher R{\'e}},
	journal = {CoRR},
	title = {DimmWitted: A Study of Main-Memory Statistical Analytics.},
	url = {http://arxiv.org/abs/1403.7550},
	year = {2014}
}

A machine-compiled macroevolutionary history of Phanerozoic life.

Shanan Peters, Ce Zhang, Miron Livny, and Christopher Ré

CoRR, January 2014

@article{abc,
	author = {Shanan Peters and Ce Zhang and Miron Livny and Christopher R{\'e}},
	journal = {CoRR},
	title = {A machine-compiled macroevolutionary history of Phanerozoic life.},
	url = {http://arxiv.org/abs/1406.2963},
	year = {2014}
}

Feature Engineering for Knowledge Base Construction.

Ce Zhang, Christopher Ré, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran Wang, and Sen Wu

CoRR, January 2014

@article{abc,
	author = {Ce Zhang and Christopher R{\'e} and Amir Abbas Sadeghian and Zifei Shan and Jaeho Shin and Feiran Wang and Sen Wu},
	journal = {CoRR},
	title = {Feature Engineering for Knowledge Base Construction.},
	url = {http://arxiv.org/abs/1407.6439},
	year = {2014}
}

DimmWitted: A Study of Main-Memory Statistical Analytics.

Ce Zhang, and Christopher Ré

PVLDB, January 2014

@inproceedings{abc,
	author = {Ce Zhang and Christopher R{\'e}},
	booktitle = {PVLDB},
	title = {DimmWitted: A Study of Main-Memory Statistical Analytics.},
	url = {http://www.vldb.org/pvldb/vol7/p1283-zhang.pdf},
	year = {2014}
}

2013

An Approximate, Efficient LP Solver for LP Rounding.

Srikrishna Sridhar, Stephen J. Wright, Christopher Ré, Ji Liu, Victor Bittorf, and Ce Zhang

Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., Lake Tahoe, Nevada, United States., December 2013

@inproceedings{abc,
	author = {Srikrishna Sridhar and Stephen J. Wright and Christopher R{\'e} and Ji Liu and Victor Bittorf and Ce Zhang},
	booktitle = {Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States.},
	title = {An Approximate, Efficient LP Solver for LP Rounding.},
	url = {http://papers.nips.cc/paper/4990-an-approximate-efficient-lp-solver-for-lp-rounding},
	venue = {Lake Tahoe, Nevada, United States.},
	year = {2013}
}

Evaluating Stream Filtering for Entity Profile Updates for TREC 2013.

John R. Frank, Steven J. Bauer, Max Kleiman-Weiner, Daniel A. Roberts, Nilesh Tripuraneni, Ce Zhang, Christopher Ré, Ellen M. Voorhees, and Ian Soboroff

Proceedings of The Twenty-Second Text REtrieval Conference, TREC 2013, Gaithersburg, Maryland, USA, November 2013

@inproceedings{abc,
	author = {John R. Frank and Steven J. Bauer and Max Kleiman-Weiner and Daniel A. Roberts and Nilesh Tripuraneni and Ce Zhang and Christopher R{\'e} and Ellen M. Voorhees and Ian Soboroff},
	booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, TREC 2013, Gaithersburg, Maryland, USA},
	title = {Evaluating Stream Filtering for Entity Profile Updates for TREC 2013.},
	url = {http://trec.nist.gov/pubs/trec22/papers/KBA.OVERVIEW.pdf},
	year = {2013}
}

Bootstrapping Knowledge Base Acceleration.

Tushar Khot, Ce Zhang, Jude W. Shavlik, Sriraam Natarajan, and Christopher Ré

Proceedings of The Twenty-Second Text REtrieval Conference, TREC 2013, Gaithersburg, Maryland, USA, November 2013

@inproceedings{abc,
	author = {Tushar Khot and Ce Zhang and Jude W. Shavlik and Sriraam Natarajan and Christopher R{\'e}},
	booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, TREC 2013, Gaithersburg, Maryland, USA},
	title = {Bootstrapping Knowledge Base Acceleration.},
	url = {http://trec.nist.gov/pubs/trec22/papers/wisc-kba.pdf},
	year = {2013}
}

Understanding Tables in Context Using Standard NLP Toolkits.

Vidhya Govindaraju, Ce Zhang, and Christopher Ré

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria, Volume 2: Short Papers, August 2013

@inproceedings{abc,
	author = {Vidhya Govindaraju and Ce Zhang and Christopher R{\'e}},
	booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013},
	title = {Understanding Tables in Context Using Standard NLP Toolkits.},
	url = {http://aclweb.org/anthology/P/P13/P13-2116.pdf},
	venue = {Sofia, Bulgaria, Volume 2: Short Papers},
	year = {2013}
}

Towards high-throughput gibbs sampling at scale: a study across storage managers.

Ce Zhang, and Christopher Ré

Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 2013

@inproceedings{abc,
	author = {Ce Zhang and Christopher R{\'e}},
	booktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA},
	title = {Towards high-throughput gibbs sampling at scale: a study across storage managers.},
	url = {http://doi.acm.org/10.1145/2463676.2463702},
	year = {2013}
}

GeoDeepDive: statistical inference using familiar data-processing languages.

Ce Zhang, Vidhya Govindaraju, Jackson Borchardt, Tim Foltz, Christopher Ré, and Shanan Peters

Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 2013

@inproceedings{abc,
	author = {Ce Zhang and Vidhya Govindaraju and Jackson Borchardt and Tim Foltz and Christopher R{\'e} and Shanan Peters},
	booktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA},
	title = {GeoDeepDive: statistical inference using familiar data-processing languages.},
	url = {http://doi.acm.org/10.1145/2463676.2463680},
	year = {2013}
}

An Approximate, Efficient Solver for LP Rounding.

Srikrishna Sridhar, Victor Bittorf, Ji Liu, Ce Zhang, Christopher Ré, and Stephen J. Wright

CoRR, -, January 2013

Many problems in machine learning can be solved by rounding the solution of an appropriate linear program (LP). This paper shows that we can recover solutions of comparable quality by rounding an approximate LP solution instead of the ex- act one. These approximate LP solutions can be computed efficiently by applying a parallel stochastic-coordinate-descent method to a quadratic-penalty formulation of the LP. We derive worst-case runtime and solution quality guarantees of this scheme using novel perturbation and convergence analysis. Our experiments demonstrate that on such combinatorial problems as vertex cover, independent set and multiway-cut, our approximate rounding scheme is up to an order of magnitude faster than Cplex (a commercial LP solver) while producing solutions of similar quality.

@inproceedings{abc,
	abstract = {Many problems in machine learning can be solved by rounding the solution of an appropriate linear program (LP). This paper shows that we can recover solutions of comparable quality by rounding an approximate LP solution instead of the ex- act one. These approximate LP solutions can be computed efficiently by applying a parallel stochastic-coordinate-descent method to a quadratic-penalty formulation of the LP. We derive worst-case runtime and solution quality guarantees of this scheme using novel perturbation and convergence analysis. Our experiments demonstrate that on such combinatorial problems as vertex cover, independent set and multiway-cut, our approximate rounding scheme is up to an order of magnitude faster than Cplex (a commercial LP solver) while producing solutions of similar quality.},
	author = {Srikrishna Sridhar and Victor Bittorf and Ji Liu and Ce Zhang and Christopher R{\'e} and Stephen J. Wright},
	booktitle = {CoRR},
	title = {An Approximate, Efficient Solver for LP Rounding.},
	url = {http://arxiv.org/abs/1311.2661},
	venue = {-},
	year = {2013}
}

Brainwash: A Data System for Feature Engineering.

Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael J. Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, and Ce Zhang

CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 2013

@inproceedings{abc,
	author = {Michael Anderson and Dolan Antenucci and Victor Bittorf and Matthew Burgess and Michael J. Cafarella and Arun Kumar and Feng Niu and Yongjoo Park and Christopher R{\'e} and Ce Zhang},
	booktitle = {CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA},
	title = {Brainwash: A Data System for Feature Engineering.},
	url = {http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper82.pdf},
	year = {2013}
}

2012

Scaling Inference for Markov Logic via Dual Decomposition.

Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik

12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 2012

@inproceedings{abc,
	author = {Feng Niu and Ce Zhang and Christopher R{\'e} and Jude W. Shavlik},
	booktitle = {12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium},
	title = {Scaling Inference for Markov Logic via Dual Decomposition.},
	url = {http://dx.doi.org/10.1109/ICDM.2012.96},
	year = {2012}
}

Building an Entity-Centric Stream Filtering Test Collection for TREC 2012.

John R. Frank, Max Kleiman-Weiner, Daniel A. Roberts, Feng Niu, Ce Zhang, Christopher Ré, and Ian Soboroff

Proceedings of The Twenty-First Text REtrieval Conference, TREC 2012, Gaithersburg, Maryland, USA, November 2012

@inproceedings{abc,
	author = {John R. Frank and Max Kleiman-Weiner and Daniel A. Roberts and Feng Niu and Ce Zhang and Christopher R{\'e} and Ian Soboroff},
	booktitle = {Proceedings of The Twenty-First Text REtrieval Conference, TREC 2012, Gaithersburg, Maryland, USA},
	title = {Building an Entity-Centric Stream Filtering Test Collection for TREC 2012.},
	url = {http://trec.nist.gov/pubs/trec21/papers/KBA.OVERVIEW.pdf},
	year = {2012}
}

DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference.

Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik

Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, Istanbul, Turkey, August 2012

@inproceedings{abc,
	author = {Feng Niu and Ce Zhang and Christopher R{\'e} and Jude W. Shavlik},
	booktitle = {Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, Istanbul, Turkey},
	title = {DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference.},
	url = {http://ceur-ws.org/Vol-884/VLDS2012_p25_Niu.pdf},
	year = {2012}
}

Big Data versus the Crowd: Looking for Relationships in All the Right Places.

Ce Zhang, Feng Niu, Christopher Ré, and Jude W. Shavlik

The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Jeju Island, Korea - Volume 1: Long Papers, July 2012

@inproceedings{abc,
	author = {Ce Zhang and Feng Niu and Christopher R{\'e} and Jude W. Shavlik},
	booktitle = {The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference},
	title = {Big Data versus the Crowd: Looking for Relationships in All the Right Places.},
	url = {http://www.aclweb.org/anthology/P12-1087},
	venue = {Jeju Island, Korea - Volume 1: Long Papers},
	year = {2012}
}

Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference.

Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik

Int. J. Semantic Web Inf. Syst., January 2012

@inproceedings{abc,
	author = {Feng Niu and Ce Zhang and Christopher R{\'e} and Jude W. Shavlik},
	booktitle = {Int. J. Semantic Web Inf. Syst.},
	title = {Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference.},
	url = {http://dx.doi.org/10.4018/jswis.2012070103},
	year = {2012}
}

2011

Felix: Scaling Inference for Markov Logic with an Operator-based Approach

Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik

CoRR, January 2011

@article{abc,
	author = {Feng Niu and Ce Zhang and Christopher R{\'e} and Jude W. Shavlik},
	journal = {CoRR},
	title = {Felix: Scaling Inference for Markov Logic with an Operator-based Approach},
	url = {http://arxiv.org/abs/1108.0294},
	year = {2011}
}