46,611 research outputs found
Analyzing the Social Structure and Dynamics of E-mail and Spam in Massive Backbone Internet Traffic
E-mail is probably the most popular application on the Internet, with
everyday business and personal communications dependent on it. Spam or
unsolicited e-mail has been estimated to cost businesses significant amounts of
money. However, our understanding of the network-level behavior of legitimate
e-mail traffic and how it differs from spam traffic is limited. In this study,
we have passively captured SMTP packets from a 10 Gbit/s Internet backbone link
to construct a social network of e-mail users based on their exchanged e-mails.
The focus of this paper is on the graph metrics indicating various structural
properties of e-mail networks and how they evolve over time. This study also
looks into the differences in the structural and temporal characteristics of
spam and non-spam networks. Our analysis on the collected data allows us to
show several differences between the behavior of spam and legitimate e-mail
traffic, which can help us to understand the behavior of spammers and give us
the knowledge to statistically model spam traffic on the network-level in order
to complement current spam detection techniques.Comment: 15 pages, 20 figures, technical repor
An Empirical Study on Android for Saving Non-shared Data on Public Storage
With millions of apps that can be downloaded from official or third-party
market, Android has become one of the most popular mobile platforms today.
These apps help people in all kinds of ways and thus have access to lots of
user's data that in general fall into three categories: sensitive data, data to
be shared with other apps, and non-sensitive data not to be shared with others.
For the first and second type of data, Android has provided very good storage
models: an app's private sensitive data are saved to its private folder that
can only be access by the app itself, and the data to be shared are saved to
public storage (either the external SD card or the emulated SD card area on
internal FLASH memory). But for the last type, i.e., an app's non-sensitive and
non-shared data, there is a big problem in Android's current storage model
which essentially encourages an app to save its non-sensitive data to shared
public storage that can be accessed by other apps. At first glance, it seems no
problem to do so, as those data are non-sensitive after all, but it implicitly
assumes that app developers could correctly identify all sensitive data and
prevent all possible information leakage from private-but-non-sensitive data.
In this paper, we will demonstrate that this is an invalid assumption with a
thorough survey on information leaks of those apps that had followed Android's
recommended storage model for non-sensitive data. Our studies showed that
highly sensitive information from billions of users can be easily hacked by
exploiting the mentioned problematic storage model. Although our empirical
studies are based on a limited set of apps, the identified problems are never
isolated or accidental bugs of those apps being investigated. On the contrary,
the problem is rooted from the vulnerable storage model recommended by Android.
To mitigate the threat, we also propose a defense framework
Machine Learning Aided Static Malware Analysis: A Survey and Tutorial
Malware analysis and detection techniques have been evolving during the last
decade as a reflection to development of different malware techniques to evade
network-based and host-based security protections. The fast growth in variety
and number of malware species made it very difficult for forensics
investigators to provide an on time response. Therefore, Machine Learning (ML)
aided malware analysis became a necessity to automate different aspects of
static and dynamic malware investigation. We believe that machine learning
aided static analysis can be used as a methodological approach in technical
Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware
analysis that has been thoroughly studied before. In this paper, we address
this research gap by conducting an in-depth survey of different machine
learning methods for classification of static characteristics of 32-bit
malicious Portable Executable (PE32) Windows files and develop taxonomy for
better understanding of these techniques. Afterwards, we offer a tutorial on
how different machine learning techniques can be utilized in extraction and
analysis of a variety of static characteristic of PE binaries and evaluate
accuracy and practical generalization of these techniques. Finally, the results
of experimental study of all the method using common data was given to
demonstrate the accuracy and complexity. This paper may serve as a stepping
stone for future researchers in cross-disciplinary field of machine learning
aided malware forensics.Comment: 37 Page
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-based Acoustic Big Data
This paper presents a novel BigEAR big data framework that employs
psychological audio processing chain (PAPC) to process smartphone-based
acoustic big data collected when the user performs social conversations in
naturalistic scenarios. The overarching goal of BigEAR is to identify moods of
the wearer from various activities such as laughing, singing, crying, arguing,
and sighing. These annotations are based on ground truth relevant for
psychologists who intend to monitor/infer the social context of individuals
coping with breast cancer. We pursued a case study on couples coping with
breast cancer to know how the conversations affect emotional and social well
being. In the state-of-the-art methods, psychologists and their team have to
hear the audio recordings for making these inferences by subjective evaluations
that not only are time-consuming and costly, but also demand manual data coding
for thousands of audio files. The BigEAR framework automates the audio
analysis. We computed the accuracy of BigEAR with respect to the ground truth
obtained from a human rater. Our approach yielded overall average accuracy of
88.76% on real-world data from couples coping with breast cancer.Comment: 6 pages, 10 equations, 1 Table, 5 Figures, IEEE International
Workshop on Big Data Analytics for Smart and Connected Health 2016, June 27,
2016, Washington DC, US
Applications of Machine Learning to Threat Intelligence, Intrusion Detection and Malware
Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or reduce analyst workload, it also introduces new attack surfaces
- …