14 research outputs found
Preventing abuse of online communities
Online communities are growing at a phenomenal rate and with the large number of users these communities contain, attackers are drawn to exploit these users. Denial of information (DoI) attacks and information leakage attacks are two popular attacks that target users on online communities. These information based attacks are linked by their opposing views on low-quality information. On the one hand denial of information attacks which primarily use low-quality information (such as spam and phishing) are a nuisance for information consumers. On the other hand information leakage attacks, which use inadvertently leaked information, are less effective when low-quality information is used, and thus leakage of low-quality information is prefered by private information producers.
In this dissertation, I introduce techniques for preventing abuse against these attacks in online communities using meta-model classification and information unification approaches, respectively. The meta-model classification approach involves classifying the ``connected payload" associated with the information and using the classification result for the determination. This approach allows for detection of DoI attacks in emerging domains where the amount of information may be constrained. My information unification approach allows for modeling and mitigating information leakage attacks. Unifying information across domains followed by a quantificiation of the information leaked, provides one of the first studies on users' susceptibality to information leakage attacks. Further, the modeling introduced allows me to quantify the reduced threat of information leakage attacks after applying information cloaking.PhDCommittee Chair: Pu, Calton; Committee Member: Ahamad, Mustaque; Committee Member: Giffin, Jonathon; Committee Member: Li, Kang; Committee Member: Liu, Lin
Cosmos: A Wiki Data Management System
Wiki applications are becoming increasingly important for
knowledge sharing between large numbers of users. To prevent
against vandalism and recover from destructive edits,
wiki applications need to maintain the revision histories of
all documents. Due to the large amounts of data and traffic,
a Wiki application needs to store the data economically
and retrieve documents efficiently. Current Wiki Data
Management Systems (WDMS) make a trade-off between
storage requirement and access time for document update
and retrieval. We introduce a new data management system,
Cosmos, to balance this trade-off. To compare Cosmos
with the other WDMSs, we use a 68GB data sample
from English Wikipedia. Our experiments show that Cosmos
uses one-fifth of the disk space when compared to MediaWiki
(Wikipedia’s backend) and performs faster than other
WDMSs at document retrieval
Is Email Business Dying?: A Study on Evolution of Email Spam Over Fifteen Years
With the increasing dedication and sophistication of spammers, email spam is a persistent problem even today. Popular social network sites such as Facebook, Twitter, and Google+ are not exempt from email spam as they all interface with email systems. While some report predicts that email spam business is dying due to the decreasing volume of email spam. Whether email spam business is really dying is an interesting question. In this paper, we analyze email spam trends on Spam Archive dataset, which contains 5.5 million spam emails over 15 years (1998 – 2013). We statistically analyze emails contents including header information (e.g. content type) and embedded items (e.g. URL links). Also, we investigate topic drift using topic modeling technique. Moreover, we perform network analysis on sender-to-receiver IP routing networks. Our study shows the dynamic nature of email spam over one and a half decades and demonstrate that the email spam business is not dying but more capricious
Evolutionary Study of Phishing
Abstract—We study the evolution of phishing email messages in a corpus of over 380,000 phishing messages collected from August 2006 to December 2007. Our first result is a classification of phishing messages into two groups: flash attacks and non-flash attacks. Phishing message producers try to extend the usefulness of a phishing message by reusing the same message. In some cases this is done by sending a large volume of phishing messages over a short period of time (flash-attack) versus the same phishing message spread over a relatively longer period (nonflash attacks). Our second result is a corresponding classification of phishing features into two groups: transitory features and pervasive features. Features which are present in a few attacks and have a relatively short life span (transitory) are generally strong indicators of phishing, whereas features which are present in most of the attacks and have a long life span (pervasive) are generally weak selectors of phishing. One explanation of this is that phishing message producers limit the utility of transitory features in time (by avoiding them in future generations of phishing) and limit the utility of pervasive features by choosing features that also appear in legitimate messages. While useful in improving the understanding of phishing messages, our results also show the need for further study