Publication

Systems Group Master's Thesis, no. ETH Zürich; Department of Computer Science, September 2008
Supervised by: Prof. Gustavo Alonso
Nowadays spam and virus mails get delivered daily to probably every user on the internet and almost everybody makes use of anti-spam software. Some of these anti-spam systems employ a collaborative component to include the users themselves into the decision process. Normally a server-based architecture is used for the collaborative part. In this thesis we assess the feasibility of using a decentralized peer-to-peer component instead. Therefore we have a look at the problem of receiving spam mails from the users point of view, where we are interested in the relations between the users. We start by analyzing spam log data and construct a graph as follows: Each user receiving unsolicited bulk email constitutes a vertex. If two particular users receive the same spam mail, they are considered to be related to each other and the corresponding vertices have an edge in between. In this graph we try to find some sort of structure, especially clustering. To achieve this goal we use different analysis methods which confirm each other and reveal different properties of the hidden structure, as they analyze the problem from different points of view. Having all the tools in place, we additionally take a look at virus-infected emails and apply the same analysis steps to virus log data. In both cases we can detect some clustering but the results differ significantly. After the analysis we evaluate collaborative filters. First we have a look at a serverbased model to have a reference detection rate and afterwards we explore various ways of decentralized collaborative filtering. This way, we assess to what extent exploiting the clustering yields in better detection rates.
@mastersthesis{abc,
	abstract = {Nowadays spam and virus mails get delivered daily to probably every user on the internet
and almost everybody makes use of anti-spam software. Some of these anti-spam
systems employ a collaborative component to include the users themselves into the decision
process.
Normally a server-based architecture is used for the collaborative part. In this thesis
we assess the feasibility of using a decentralized peer-to-peer component instead.
Therefore we have a look at the problem of receiving spam mails from the users point of
view, where we are interested in the relations between the users. We start by analyzing
spam log data and construct a graph as follows: Each user receiving unsolicited bulk
email constitutes a vertex. If two particular users receive the same spam mail, they are
considered to be related to each other and the corresponding vertices have an edge in
between.
In this graph we try to find some sort of structure, especially clustering. To achieve
this goal we use different analysis methods which confirm each other and reveal different
properties of the hidden structure, as they analyze the problem from different points of
view.
Having all the tools in place, we additionally take a look at virus-infected emails and
apply the same analysis steps to virus log data. In both cases we can detect some clustering
but the results differ significantly.
After the analysis we evaluate collaborative filters. First we have a look at a serverbased
model to have a reference detection rate and afterwards we explore various ways
of decentralized collaborative filtering. This way, we assess to what extent exploiting
the clustering yields in better detection rates.
},
	author = {Simon Tobler},
	school = {ETH Z{\"u}rich},
	title = {Exploring Decentralized Collaborative Filtering against Spam Mail},
	year = {2008}
}