66 research outputs found
Advanced Machine Learning Techniques and Meta-Heuristic Optimization for the Detection of Masquerading Attacks in Social Networks
According to the report published by the online protection firm Iovation in 2012,
cyber fraud ranged from 1 percent of the Internet transactions in North America
Africa to a 7 percent in Africa, most of them involving credit card fraud, identity
theft, and account takeover or hÂĽacking attempts. This kind of crime is still growing
due to the advantages offered by a non face-to-face channel where a increasing
number of unsuspecting victims divulges sensitive information. Interpol classifies
these illegal activities into 3 types:
• Attacks against computer hardware and software.
• Financial crimes and corruption.
• Abuse, in the form of grooming or “sexploitation”.
Most research efforts have been focused on the target of the crime developing different
strategies depending on the casuistic. Thus, for the well-known phising, stored
blacklist or crime signals through the text are employed eventually designing adhoc
detectors hardly conveyed to other scenarios even if the background is widely
shared. Identity theft or masquerading can be described as a criminal activity oriented
towards the misuse of those stolen credentials to obtain goods or services by
deception. On March 4, 2005, a million of personal and sensitive information such
as credit card and social security numbers was collected by White Hat hackers at
Seattle University who just surfed the Web for less than 60 minutes by means of
the Google search engine. As a consequence they proved the vulnerability and lack
of protection with a mere group of sophisticated search terms typed in the engine
whose large data warehouse still allowed showing company or government websites
data temporarily cached.
As aforementioned, platforms to connect distant people in which the interaction is
undirected pose a forcible entry for unauthorized thirds who impersonate the licit
user in a attempt to go unnoticed with some malicious, not necessarily economic,
interests. In fact, the last point in the list above regarding abuses has become a
major and a terrible risk along with the bullying being both by means of threats,
harassment or even self-incrimination likely to drive someone to suicide, depression
or helplessness. California Penal Code Section 528.5 states:
“Notwithstanding any other provision of law, any person who knowingly
and without consent credibly impersonates another actual person through
or on an Internet Web site or by other electronic means for purposes of
harming, intimidating, threatening, or defrauding another person is guilty
of a public offense punishable pursuant to subdivision [...]”.
IV
Therefore, impersonation consists of any criminal activity in which someone assumes
a false identity and acts as his or her assumed character with intent to get
a pecuniary benefit or cause some harm. User profiling, in turn, is the process of
harvesting user information in order to construct a rich template with all the advantageous
attributes in the field at hand and with specific purposes. User profiling is
often employed as a mechanism for recommendation of items or useful information
which has not yet considered by the client. Nevertheless, deriving user tendency or
preferences can be also exploited to define the inherent behavior and address the
problem of impersonation by detecting outliers or strange deviations prone to entail
a potential attack.
This dissertation is meant to elaborate on impersonation attacks from a profiling
perspective, eventually developing a 2-stage environment which consequently embraces
2 levels of privacy intrusion, thus providing the following contributions:
• The inference of behavioral patterns from the connection time traces aiming at
avoiding the usurpation of more confidential information. When compared to
previous approaches, this procedure abstains from impinging on the user privacy
by taking over the messages content, since it only relies on time statistics
of the user sessions rather than on their content.
• The application and subsequent discussion of two selected algorithms for the
previous point resolution:
– A commonly employed supervised algorithm executed as a binary classifier
which thereafter has forced us to figure out a method to deal with the
absence of labeled instances representing an identity theft.
– And a meta-heuristic algorithm in the search for the most convenient parameters
to array the instances within a high dimensional space into properly
delimited clusters so as to finally apply an unsupervised clustering
algorithm.
• The analysis of message content encroaching on more private information but
easing the user identification by mining discriminative features by Natural
Language Processing (NLP) techniques. As a consequence, the development of
a new feature extraction algorithm based on linguistic theories motivated by
the massive quantity of features often gathered when it comes to texts.
In summary, this dissertation means to go beyond typical, ad-hoc approaches
adopted by previous identity theft and authorship attribution research. Specifically
it proposes tailored solutions to this particular and extensively studied paradigm
with the aim at introducing a generic approach from a profiling view, not tightly
bound to a unique application field. In addition technical contributions have been
made in the course of the solution formulation intending to optimize familiar methods
for a better versatility towards the problem at hand. In summary: this Thesis
establishes an encouraging research basis towards unveiling subtle impersonation
attacks in Social Networks by means of intelligent learning techniques
A novel approach of mining write-prints for authorship attribution in e-mail forensics
There is an alarming increase in the number of cyber-crime incidents through anonymous e-mails. The problem of email authorship attribution is to identify the most plausible author of an anonymous e-mail from a group of potential suspects. Most previous contributions employed a traditional classification approach, such as decision tree and Support Vector Machine (SVM), to identify the author and studied the effects of different writing style features on the classification accuracy. However, little attention has been given on ensuring the quality of the evidence. In this paper, we introduce an innovative data mining method to capture the writeprint of every suspect and model it as combinations of features that occurred frequently in the suspect’s emails. This notion is called frequent pattern, which has proven to be effective in many data mining applications, but it is the first time to be applied to the problem of authorship attribution. Unlike the traditional approach, the extracted write-print by our method is unique among the suspects and, therefore, provides convincing and credible evidence for presenting it in a court of law. Experiments on real-life e-mails suggest that the proposed method can effectively identify the author and the results are supported by a strong evidence
- …