Publications by Florian%20Widmer

×

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2012

Systems Group Master's Thesis, no. 38; Department of Computer Science, February 2012
Supervised by: Prof. Donald Kossmann
Databases generally adhere to the \closed-world" assumption. If data is not in the database, the database treats it as non-existent. This model works well for things like financial data or inventory. For other data types such as addresses, the data may exist but not be in the database. New systems called crowd-sourced databases now assume an \open-world" and allow a schema to contain columns or even entire tables that are filled with information that is crowd-sourced. Crowd-sourcing relations between entities is the next step in this de- velopment. This allows joins and orderings of data that is difficult to compare computationally but easily compared by humans. This master thesis investigates data-structures and algorithms to make the most out of crowd-sourced relations by exploiting the transitivity inherent to the equality and order relations. Along the way, ambiguities in the data have to be tolerated and resolved. After all, humans are far from perfect and so is the data that crowd-sourcing provides.
@mastersthesis{abc,
	abstract = {Databases generally adhere to the \closed-world" assumption. If data
is not in the database, the database treats it as non-existent. This model
works well for things like financial data or inventory. For other data types
such as addresses, the data may exist but not be in the database. New
systems called crowd-sourced databases now assume an \open-world" and
allow a schema to contain columns or even entire tables that are filled with
information that is crowd-sourced.
Crowd-sourcing relations between entities is the next step in this de-
velopment. This allows joins and orderings of data that is difficult to
compare computationally but easily compared by humans. This master
thesis investigates data-structures and algorithms to make the most out
of crowd-sourced relations by exploiting the transitivity inherent to the
equality and order relations. Along the way, ambiguities in the data have
to be tolerated and resolved. After all, humans are far from perfect and
so is the data that crowd-sourcing provides.},
	author = {Florian Widmer},
	school = {38},
	title = {Memoization of Crowd-sourced Comparisons},
	year = {2012}
}
January 2012
@techreport{abc,
	author = {Anja Gruenheid and Donald Kossmann and Sukriti Ramesh and Florian Widmer},
	title = {Crowdsourcing Entity Resolution: When is A=B?},
	year = {2012}
}