Publication

Systems Group Master's Thesis, no. ETH Zürich; Department of Computer Science, January 2008
Supervised by: Prof. Donald Kossmann
XQuery, a powerful query language for querying XML data, and XML Schema, a language for expressing a schema -a set of rules describing the structure and data content of XML documents- are established XML technologies and W3C standards. The W3C Recommendation for XQuery describes the option of using and importing XML schemas in XQuery queries, allowing for the assignment of types to data and validation of both input XML and query results. The feature of schema-awareness provides the XQuery processor with information about definitions included in schemas, resulting in better optimized and tested queries. This information can be exploited to facilitate and support predictability, early detection of errors, optimizations, special processing based on type and the validity of query results. At the same time, data stream processing has become a hot topic. One of the issues that the research community tries to resolve is the fact that streaming is dealing -by its very nature- with infinite data sources and in most cases a streamed element can be seen and accessed only once. A recent research proposal provides a formal specification of schema for describing the properties of streams. The availability of such stream schema information can enable optimizations and early detection of problems arising with infinite data .For the special case of XML streams, the proposal also describes an XML schema extension. The main goal of this Master Thesis is the integration of XML Schema-awareness in the MXQuery engine and the exploration of implications that the availability of such a feature has for query processing, streaming execution and optimizations. For the needs of streams processing, the implementation of the XML Schema extension for streams and the support of a tool for XML Stream parsing and validation constitute further goals of this thesis. MXQuery is a Java-based XQuery and XQueryP engine that uses a low-memory footprint and supports streaming execution.
@mastersthesis{abc,
	abstract = {XQuery, a powerful query language for querying XML data, and XML Schema, a language for expressing a schema -a set of rules describing the structure and data content of XML documents- are established XML technologies and W3C standards.
The W3C Recommendation for XQuery describes the option of using and importing XML schemas in XQuery queries, allowing for the assignment of types to data and validation of both input XML and query results.
The feature of schema-awareness provides the XQuery processor with information about definitions included in schemas, resulting in better optimized and tested queries. This information can be exploited to facilitate and support predictability, early detection of errors, optimizations, special processing based on type and the validity of query results.
At the same time, data stream processing has become a hot topic. One of the issues that the research community tries to resolve is the fact that streaming is dealing -by its very nature- with infinite data sources and in most cases a streamed element can be seen and accessed only once.
A recent research proposal provides a formal specification of schema for describing the properties of streams. The availability of such stream schema information can enable optimizations and early detection of problems arising with infinite data .For the special case of XML streams, the proposal also describes an XML schema extension.
The main goal of this Master Thesis is the integration of XML Schema-awareness in the MXQuery engine and the exploration of implications that the availability of such a feature has for query processing, streaming execution and optimizations. For the needs of streams processing, the implementation of the XML Schema extension for streams and the support of a tool for XML Stream parsing and validation constitute further goals of this thesis. MXQuery is a Java-based XQuery and XQueryP engine that uses a low-memory footprint and supports streaming execution.},
	author = {Kostis Tsoulos},
	school = {ETH Z{\"u}rich},
	title = {XML Schema Support in MXQuery},
	year = {2008}
}