Publications by Sen Wu

×

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2017

VLDB J., January 2017
Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. Recent names used for this problem include dealing with dark data and knowledge base construction (KBC). In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. We observe that the KBC process is iterative, and we develop techniques to incrementally produce inference results for KBC systems. We propose two methods for incremental inference, based respectively on sampling and variational techniques. We also study the tradeoff space of these methods and develop a simple rule-based optimizer. DeepDive includes all of these contributions, and we evaluate DeepDive on five KBC systems, showing that it can speed up KBC inference tasks by up to two orders of magnitude with negligible impact on quality.
@article{abc,
	abstract = {Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. Recent names used for this problem include dealing with dark data and knowledge base construction (KBC). In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. We observe that the KBC process is iterative, and we develop techniques to incrementally produce inference results for KBC systems. We propose two methods for incremental inference, based respectively on sampling and variational techniques. We also study the tradeoff space of these methods and develop a simple rule-based optimizer. DeepDive includes all of these contributions, and we evaluate DeepDive on five KBC systems, showing that it can speed up KBC inference tasks by up to two orders of magnitude with negligible impact on quality.},
	author = {Christopher De Sa and Alexander Ratner and Christopher R{\'e} and Jaeho Shin and Feiran Wang and Sen Wu and Ce Zhang},
	journal = {VLDB J.},
	title = {Incremental knowledge base construction using DeepDive.},
	url = {http://dx.doi.org/10.1007/s00778-016-0437-2},
	year = {2017}
}
Commun. ACM, -, January 2017
The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.
@inproceedings{abc,
	abstract = {The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.},
	author = {Ce Zhang and Christopher R{\'e} and Michael J. Cafarella and Jaeho Shin and Feiran Wang and Sen Wu},
	booktitle = {Commun. ACM},
	title = {DeepDive: declarative knowledge base construction.},
	url = {http://doi.acm.org/10.1145/3060586},
	venue = {-},
	year = {2017}
}

2016

SIGMOD Record, January 2016
The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.
@article{abc,
	abstract = {The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.},
	author = {Christopher De Sa and Alexander Ratner and Christopher R{\'e} and Jaeho Shin and Feiran Wang and Sen Wu and Ce Zhang},
	journal = {SIGMOD Record},
	title = {DeepDive: Declarative Knowledge Base Construction.},
	url = {http://doi.acm.org/10.1145/2949741.2949756},
	year = {2016}
}

2015

PVLDB, January 2015
@inproceedings{abc,
	author = {Jaeho Shin and Sen Wu and Feiran Wang and Christopher De Sa and Ce Zhang and Christopher R{\'e}},
	booktitle = {PVLDB},
	title = {Incremental Knowledge Base Construction Using DeepDive.},
	url = {http://www.vldb.org/pvldb/vol8/p1310-shin.pdf},
	year = {2015}
}
CoRR, January 2015
@article{abc,
	author = {Sen Wu and Ce Zhang and Feiran Wang and Christopher R{\'e}},
	journal = {CoRR},
	title = {Incremental Knowledge Base Construction Using DeepDive.},
	url = {http://arxiv.org/abs/1502.00731},
	year = {2015}
}

2014

CoRR, January 2014
@article{abc,
	author = {Ce Zhang and Christopher R{\'e} and Amir Abbas Sadeghian and Zifei Shan and Jaeho Shin and Feiran Wang and Sen Wu},
	journal = {CoRR},
	title = {Feature Engineering for Knowledge Base Construction.},
	url = {http://arxiv.org/abs/1407.6439},
	year = {2014}
}
IEEE Data Eng. Bull., January 2014
@inproceedings{abc,
	author = {Christopher R{\'e} and Amir Abbas Sadeghian and Zifei Shan and Jaeho Shin and Feiran Wang and Sen Wu and Ce Zhang},
	booktitle = {IEEE Data Eng. Bull.},
	title = {Feature Engineering for Knowledge Base Construction.},
	url = {http://sites.computer.org/debull/A14sept/p26.pdf},
	year = {2014}
}