311 research outputs found
Recommended from our members
Hardware and software fingerprinting of mobile devices
This dissertation presents novel and practical algorithms to identify the software and hardware components on mobile devices. In particular, we make significant contributions in two challenging areas: library fingerprinting, to identify third-party software libraries, and device fingerprinting, to identify individual hardware components. Our work has significant implications for the privacy and security of mobile platforms.
Software-based library fingerprinting can be used to detect vulnerable libraries and uncover large-scale data collection activities. We develop a novel Android library finger-printing tool, LibID, to reliably identify specific versions of in-app third-party libraries. LibID is more effective against code obfuscation than prior art. When comparing LibID with other tools in identifying the correct library version using obfuscated F-Droid apps, LibID achieves an F1 score of more than 0.5 in all cases while prior work is below 0.25. We also demonstrate the utility of LibID by detecting the use of a vulnerable version of the OkHttp library in nearly 10% of the 3 958 popular apps on the Google Play Store.
Hardware-based device fingerprinting allows apps and websites to invade user privacy by tracking user activity online as the user moves between apps or websites. In particular, we present a new type of device fingerprinting attack, the factory calibration fingerprinting attack, that recovers embedded per-device factory calibration data from motion sensors in a smartphone. We investigate the calibration behaviour of each sensor and show that the calibration fingerprint is fast to generate, does not change over time or after a factory reset, and can be obtained without any special user permissions.
We estimate the entropy of the calibration fingerprint and find the fingerprint is very likely to be globally unique for iOS devices (~67 bits of entropy for iPhone 6S) and recent Google Pixel devices (~57 bits of entropy for Pixel 4/4 XL). By comparison, the fingerprint generated by previous work has at most 13 bits of entropy. Following our disclosures, Apple deployed a fix in iOS 12.2 and Google in Android 11.
Both code obfuscation and factory calibration help to hide software and hardware idiosyncrasies from third-parties, but this dissertation demonstrates that reliable software and hardware fingerprints can still be generated given sufficient knowledge and a suitable approach. Our work has significant practical implications and can be used to improve platform security and protect user privacy.China Scholarship Council
The Boeing Company
Microsoft Researc
Three Decades of Deception Techniques in Active Cyber Defense -- Retrospect and Outlook
Deception techniques have been widely seen as a game changer in cyber
defense. In this paper, we review representative techniques in honeypots,
honeytokens, and moving target defense, spanning from the late 1980s to the
year 2021. Techniques from these three domains complement with each other and
may be leveraged to build a holistic deception based defense. However, to the
best of our knowledge, there has not been a work that provides a systematic
retrospect of these three domains all together and investigates their
integrated usage for orchestrated deceptions. Our paper aims to fill this gap.
By utilizing a tailored cyber kill chain model which can reflect the current
threat landscape and a four-layer deception stack, a two-dimensional taxonomy
is developed, based on which the deception techniques are classified. The
taxonomy literally answers which phases of a cyber attack campaign the
techniques can disrupt and which layers of the deception stack they belong to.
Cyber defenders may use the taxonomy as a reference to design an organized and
comprehensive deception plan, or to prioritize deception efforts for a budget
conscious solution. We also discuss two important points for achieving active
and resilient cyber defense, namely deception in depth and deception lifecycle,
where several notable proposals are illustrated. Finally, some outlooks on
future research directions are presented, including dynamic integration of
different deception techniques, quantified deception effects and deception
operation cost, hardware-supported deception techniques, as well as techniques
developed based on better understanding of the human element.Comment: 19 page
Recommended from our members
TOWARDS RELIABLE CIRCUMVENTION OF INTERNET CENSORSHIP
The Internet plays a crucial role in today\u27s social and political movements by facilitating the free circulation of speech, information, and ideas; democracy and human rights throughout the world critically depend on preserving and bolstering the Internet\u27s openness. Consequently, repressive regimes, totalitarian governments, and corrupt corporations regulate, monitor, and restrict the access to the Internet, which is broadly known as Internet \emph{censorship}. Most countries are improving the internet infrastructures, as a result they can implement more advanced censoring techniques. Also with the advancements in the application of machine learning techniques for network traffic analysis have enabled the more sophisticated Internet censorship. In this thesis, We take a close look at the main pillars of internet censorship, we will introduce new defense and attacks in the internet censorship literature.
Internet censorship techniques investigate users’ communications and they can decide to interrupt a connection to prevent a user from communicating with a specific entity. Traffic analysis is one of the main techniques used to infer information from internet communications. One of the major challenges to traffic analysis mechanisms is scaling the techniques to today\u27s exploding volumes of network traffic, i.e., they impose high storage, communications, and computation overheads. We aim at addressing this scalability issue by introducing a new direction for traffic analysis, which we call \emph{compressive traffic analysis}. Moreover, we show that, unfortunately, traffic analysis attacks can be conducted on Anonymity systems with drastically higher accuracies than before by leveraging emerging learning mechanisms. We particularly design a system, called \deepcorr, that outperforms the state-of-the-art by significant margins in correlating network connections. \deepcorr leverages an advanced deep learning architecture to \emph{learn} a flow correlation function tailored to complex networks. Also to be able to analyze the weakness of such approaches we show that an adversary can defeat deep neural network based traffic analysis techniques by applying statistically undetectable \emph{adversarial perturbations} on the patterns of live network traffic.
We also design techniques to circumvent internet censorship. Decoy routing is an emerging approach for censorship circumvention in which circumvention is implemented with help from a number of volunteer Internet autonomous systems, called decoy ASes. We propose a new architecture for decoy routing that, by design, is significantly stronger to rerouting attacks compared to \emph{all} previous designs. Unlike previous designs, our new architecture operates decoy routers only on the downstream traffic of the censored users; therefore we call it \emph{downstream-only} decoy routing. As we demonstrate through Internet-scale BGP simulations, downstream-only decoy routing offers significantly stronger resistance to rerouting attacks, which is intuitively because a (censoring) ISP has much less control on the downstream BGP routes of its traffic. Then, we propose to use game theoretic approaches to model the arms races between the censors and the censorship circumvention tools. This will allow us to analyze the effect of different parameters or censoring behaviors on the performance of censorship circumvention tools. We apply our methods on two fundamental problems in internet censorship.
Finally, to bring our ideas to practice, we designed a new censorship circumvention tool called \name. \name aims at increasing the collateral damage of censorship by employing a ``mass\u27\u27 of normal Internet users, from both censored and uncensored areas, to serve as circumvention proxies
The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files
In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated
On the Generation of Cyber Threat Intelligence: Malware and Network Traffic Analyses
In recent years, malware authors drastically changed their course on the subject of threat design and implementation. Malware authors, namely, hackers or cyber-terrorists perpetrate new forms of cyber-crimes involving more innovative hacking techniques. Being motivated by financial or political reasons, attackers target computer systems ranging from personal computers to organizations’ networks to collect and steal sensitive data
as well as blackmail, scam people, or scupper IT infrastructures. Accordingly, IT security experts face new challenges, as they need to counter cyber-threats proactively. The challenge takes a continuous allure of a fight, where cyber-criminals are obsessed by the idea of outsmarting security defenses. As such, security experts have to elaborate an effective strategy to counter cyber-criminals. The generation of cyber-threat intelligence is of a paramount importance as stated in the following quote: “the field is owned by who owns the intelligence”. In this thesis, we address the problem of generating timely and relevant cyber-threat intelligence for the purpose of detection, prevention and mitigation
of cyber-attacks. To do so, we initiate a research effort, which falls into: First, we analyze prominent cyber-crime toolkits to grasp the inner-secrets and workings of advanced threats. We dissect prominent malware like Zeus and Mariposa botnets to uncover
their underlying techniques used to build a networked army of infected machines. Second, we investigate cyber-crime infrastructures, where we elaborate on the generation of a cyber-threat intelligence for situational awareness. We adapt a graph-theoretic approach to study infrastructures used by malware to perpetrate malicious activities. We build a scoring mechanism based on a page ranking algorithm to measure the badness of
infrastructures’ elements, i.e., domains, IPs, domain owners, etc. In addition, we use the min-hashing technique to evaluate the level of sharing among cyber-threat infrastructures during a period of one year. Third, we use machine learning techniques to fingerprint malicious IP traffic. By fingerprinting, we mean detecting malicious network flows and their attribution to malware families. This research effort relies on a ground truth collected
from the dynamic analysis of malware samples. Finally, we investigate the generation of cyber-threat intelligence from passive DNS streams. To this end, we design and implement
a system that generates anomalies from passive DNS traffic. Due to the tremendous nature of DNS data, we build a system on top of a cluster computing framework, namely, Apache Spark [70]. The integrated analytic system has the ability to detect anomalies
observed in DNS records, which are potentially generated by widespread cyber-threats
Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned
Binary code similarity analysis (BCSA) is widely used for diverse security
applications such as plagiarism detection, software license violation
detection, and vulnerability discovery. Despite the surging research interest
in BCSA, it is significantly challenging to perform new research in this field
for several reasons. First, most existing approaches focus only on the end
results, namely, increasing the success rate of BCSA, by adopting
uninterpretable machine learning. Moreover, they utilize their own benchmark
sharing neither the source code nor the entire dataset. Finally, researchers
often use different terminologies or even use the same technique without citing
the previous literature properly, which makes it difficult to reproduce or
extend previous work. To address these problems, we take a step back from the
mainstream and contemplate fundamental research questions for BCSA. Why does a
certain technique or a feature show better results than the others?
Specifically, we conduct the first systematic study on the basic features used
in BCSA by leveraging interpretable feature engineering on a large-scale
benchmark. Our study reveals various useful insights on BCSA. For example, we
show that a simple interpretable model with a few basic features can achieve a
comparable result to that of recent deep learning-based approaches.
Furthermore, we show that the way we compile binaries or the correctness of
underlying binary analysis tools can significantly affect the performance of
BCSA. Lastly, we make all our source code and benchmark public and suggest
future directions in this field to help further research.Comment: 22 pages, under revision to Transactions on Software Engineering
(July 2021
On the Implementation of Location Obfuscation in openwifi and Its Performance
Wi-Fi sensing as a side-effect of communications is opening new opportunities for smart services integrating communications with environmental properties, first and foremost the position of devices and people. At the same time, this technology represents an unprecedented threat to people’s privacy, as personal information can be collected directly at the physical layer without any possibility to hide or protect it. Several works already discussed the possibility of safeguarding users’ privacy without hampering communication performance. Usually, some signal pre-processing at the transmitter side is needed to introduce pseudo-random (artificial) patterns in the channel response estimated at the receiver, preventing the extraction of meaningful information from the channel state. However, there is currently just one implementation of such techniques in a real system (openwifi), and it has never been tested for performance. In this work, we present the implementation of a location obfuscation technique within the openwifi project that enables fine manipulation of the radio signal at transmitter side and yields acceptable, if not good, performance. The paper discusses the implementation of the obfuscation subsystem, its performance, possible improvements, and further steps to allow authorized devices to “de-obfuscate” the signal and retrieve the sensed information
Machine learning techniques for identification using mobile and social media data
Networked access and mobile devices provide near constant data generation and collection. Users, environments, applications, each generate different types of data; from the voluntarily provided data posted in social networks to data collected by sensors on mobile devices, it is becoming trivial to access big data caches. Processing sufficiently large amounts of data results in inferences that can be characterized as privacy invasive. In order to address privacy risks we must understand the limits of the data exploring relationships between variables and how the user is reflected in them. In this dissertation we look at data collected from social networks and sensors to identify some aspect of the user or their surroundings. In particular, we find that from social media metadata we identify individual user accounts and from the magnetic field readings we identify both the (unique) cellphone device owned by the user and their course-grained location. In each project we collect real-world datasets and apply supervised learning techniques, particularly multi-class classification algorithms to test our hypotheses. We use both leave-one-out cross validation as well as k-fold cross validation to reduce any bias in the results. Throughout the dissertation we find that unprotected data reveals sensitive information about users. Each chapter also contains a discussion about possible obfuscation techniques or countermeasures and their effectiveness with regards to the conclusions we present. Overall our results show that deriving information about users is attainable and, with each of these results, users would have limited if any indication that any type of analysis was taking place
- …