Publication

Systems Group Master's Thesis, no. 7; Department of Computer Science, April 2011
Supervised by: Prof. Donald Kossmann
The web has become a real-time communication medium, used by a large amount of people, in ever-increasing parts of their daily life. This new usage pattern gives advertisers and marketeers great opportunities to learn about their customers' temporal interests, thoughts and current context. Despite it's a well known fact that this information is extremely valuable for advertising and product recommendation, online advertising is adapting only slowly. This is mainly due to the fact that it's not clear what information is valuable to use, the huge amount of produced data and the lack of effcient models to process this data. This thesis describes an approach to implement Scalable Real-Time Product Recommendation based on Users Activity in a Social Network. The products are taken from Amazon.com and the used social network is the microblogging platform Twitter. It presents an implementation of this approach on top of the key-value database Cassandra, using a system called Triggy. Triggy extends Cassandra with incremental Map-Reduce tasks for push-style data processing. Use cases that require high-performance analysis of large amounts of data, are the showpiece of every stream processing engine. These engines are built to process massive amounts of data in very short time. Therefore, this thesis contains a comparison between four state-of-the-art distributed stream processing engines and the implementation with Triggy. It's showed that the analyzed use case has various properties that make it's implementation in a stream processing engine impossible. Finally, a demo application is presented, to show the described approach.
@mastersthesis{abc,
	abstract = {The web has become a real-time communication medium, used by a large
amount of people, in ever-increasing parts of their daily life. This new usage
pattern gives advertisers and marketeers great opportunities to learn about
their customers{\textquoteright} temporal interests, thoughts and current context.
Despite it{\textquoteright}s a well known fact that this information is extremely valuable for
advertising and product recommendation, online advertising is adapting only
slowly. This is mainly due to the fact that it{\textquoteright}s not clear what information is
valuable to use, the huge amount of produced data and the lack of effcient
models to process this data.
This thesis describes an approach to implement Scalable Real-Time Product
Recommendation based on Users Activity in a Social Network. The products
are taken from Amazon.com and the used social network is the microblogging
platform Twitter. It presents an implementation of this approach on top of the
key-value database Cassandra, using a system called Triggy. Triggy extends
Cassandra with incremental Map-Reduce tasks for push-style data processing.
Use cases that require high-performance analysis of large amounts of data,
are the showpiece of every stream processing engine. These engines are built to
process massive amounts of data in very short time. Therefore, this thesis contains
a comparison between four state-of-the-art distributed stream processing
engines and the implementation with Triggy. It{\textquoteright}s showed that the analyzed use
case has various properties that make it{\textquoteright}s implementation in a stream processing
engine impossible.
Finally, a demo application is presented, to show the described approach.},
	author = {Michael Haspra},
	school = {7},
	title = {Scalable Real-Time Product Recommendation based on Users Activity in a Social Network},
	year = {2011}
}