71 research outputs found
SyzTrust: State-aware Fuzzing on Trusted OS Designed for IoT Devices
Trusted Execution Environments (TEEs) embedded in IoT devices provide a
deployable solution to secure IoT applications at the hardware level. By
design, in TEEs, the Trusted Operating System (Trusted OS) is the primary
component. It enables the TEE to use security-based design techniques, such as
data encryption and identity authentication. Once a Trusted OS has been
exploited, the TEE can no longer ensure security. However, Trusted OSes for IoT
devices have received little security analysis, which is challenging from
several perspectives: (1) Trusted OSes are closed-source and have an
unfavorable environment for sending test cases and collecting feedback. (2)
Trusted OSes have complex data structures and require a stateful workflow,
which limits existing vulnerability detection tools. To address the challenges,
we present SyzTrust, the first state-aware fuzzing framework for vetting the
security of resource-limited Trusted OSes. SyzTrust adopts a hardware-assisted
framework to enable fuzzing Trusted OSes directly on IoT devices as well as
tracking state and code coverage non-invasively. SyzTrust utilizes composite
feedback to guide the fuzzer to effectively explore more states as well as to
increase the code coverage. We evaluate SyzTrust on Trusted OSes from three
major vendors: Samsung, Tsinglink Cloud, and Ali Cloud. These systems run on
Cortex M23/33 MCUs, which provide the necessary abstraction for embedded TEEs.
We discovered 70 previously unknown vulnerabilities in their Trusted OSes,
receiving 10 new CVEs so far. Furthermore, compared to the baseline, SyzTrust
has demonstrated significant improvements, including 66% higher code coverage,
651% higher state coverage, and 31% improved vulnerability-finding capability.
We report all discovered new vulnerabilities to vendors and open source
SyzTrust.Comment: To appear in the IEEE Symposium on Security and Privacy (IEEE S&P)
2024, San Francisco, CA, US
Federated learning for distributed intrusion detection systems in public networks
Abstract. The rapid integration of technologies such as IoT devices, cloud, and edge computing has led to a progressively interconnected network of intelligent environments, services, and public infrastructures. This evolution highlights the critical need for sophisticated and self-governing Intrusion Detection Systems (IDS) to enhance trust and ensure the security and integrity of these interconnected environments. Furthermore, the advancement of AI-based Intrusion Detection Systems hinges on the effective utilization of high-quality data for model training. A considerable number of datasets created in controlled lab environments have recently been released, which has significantly facilitated researchers in developing and evaluating resilient Machine Learning models. However, a substantial portion of the architectures and datasets available are now considered outdated. As a result, the principal aim of this thesis is to contribute to the enhancement of knowledge concerning the creation of contemporary testbed architectures specifically designed for defense systems. The main objective of this study is to propose an innovative testbed infrastructure design, capitalizing on the broad connectivity panOULU public network, to facilitate the analysis and evaluation of AI-based security applications within a public network setting. The testbed incorporates a variety of distributed computing paradigms including edge, fog, and cloud computing. It simplifies the adoption of technologies like Software-Defined Networking, Network Function Virtualization, and Service Orchestration by leveraging the capabilities of the VMware vSphere platform. In the learning phase, a custom-developed application uses information from the attackers to automatically classify incoming data as either normal or malicious. This labeled data is then used for training machine learning models within a federated learning framework (FED-ML). The trained models are validated using previously unseen network data (test data). The entire procedure, from collecting network traffic to labeling data, and from training models within the federated architecture, operates autonomously, removing the necessity for human involvement. The development and implementation of FED-ML models in this thesis may contribute towards laying the groundwork for future-forward, AI-oriented cybersecurity measures. The dataset and testbed configuration showcased in this research could improve our understanding of the challenges associated with safeguarding public networks, especially those with heterogeneous environments comprising various technologies
Systematically Detecting Packet Validation Vulnerabilities in Embedded Network Stacks
Embedded Network Stacks (ENS) enable low-resource devices to communicate with
the outside world, facilitating the development of the Internet of Things and
Cyber-Physical Systems. Some defects in ENS are thus high-severity
cybersecurity vulnerabilities: they are remotely triggerable and can impact the
physical world. While prior research has shed light on the characteristics of
defects in many classes of software systems, no study has described the
properties of ENS defects nor identified a systematic technique to expose them.
The most common automated approach to detecting ENS defects is feedback-driven
randomized dynamic analysis ("fuzzing"), a costly and unpredictable technique.
This paper provides the first systematic characterization of cybersecurity
vulnerabilities in ENS. We analyzed 61 vulnerabilities across 6 open-source
ENS. Most of these ENS defects are concentrated in the transport and network
layers of the network stack, require reaching different states in the network
protocol, and can be triggered by only 1-2 modifications to a single packet. We
therefore propose a novel systematic testing framework that focuses on the
transport and network layers, uses seeds that cover a network protocol's
states, and systematically modifies packet fields. We evaluated this framework
on 4 ENS and replicated 12 of the 14 reported IP/TCP/UDP vulnerabilities. On
recent versions of these ENSs, it discovered 7 novel defects (6 assigned CVES)
during a bounded systematic test that covered all protocol states and made up
to 3 modifications per packet. We found defects in 3 of the 4 ENS we tested
that had not been found by prior fuzzing research. Our results suggest that
fuzzing should be deferred until after systematic testing is employed.Comment: 12 pages, 3 figures, to be published in the 38th IEEE/ACM
International Conference on Automated Software Engineering (ASE 2023
Adonis: Practical and Efficient Control Flow Recovery through OS-Level Traces
Control flow recovery is critical to promise the software quality, especially for large-scale software in production environment.
However, the efficiency of most current control flow recovery techniques is compromised due to their runtime overheads along with
deployment and development costs. To tackle this problem, we propose a novel solution, Adonis, which harnesses OS-level traces,
such as dynamic library calls and system call traces, to efficiently and safely recover control flows in practice. Adonis operates in
two steps: it first identifies the call-sites of trace entries, then it executes a pair-wise symbolic execution to recover valid execution
paths. This technique has several advantages. First, Adonis does not require the insertion of any probes into existing applications,
thereby minimizing runtime cost. Second, given that OS-level traces are hardware-independent, Adonis can be implemented across
various hardware configurations without the need for hardware-specific engineering efforts, thus reducing deployment cost. Third, as
Adonis is fully automated and does not depend on manually created logs, it circumvents additional development cost. We conducted an
evaluation of Adonis on representative desktop applications and real-world IoT applications. Adonis can faithfully recover the control
flow with 86.8% recall and 81.7% precision. Compared to the state-of-the-art log-based approach, Adonis can not only cover all the
execution paths recovered, but also recover 74.9% of statements that cannot be covered. In addition, the runtime cost of Adonis is
18.3Ă— lower than the instrument-based approach; the analysis time and storage cost (indicative of the deployment cost) of Adonis is
50Ă— smaller and 443Ă— smaller than the hardware-based approach, respectively. To facilitate future replication and extension of this
work, we have made the code and data publicly available
THE SCALABLE AND ACCOUNTABLE BINARY CODE SEARCH AND ITS APPLICATIONS
The past decade has been witnessing an explosion of various applications and devices.
This big-data era challenges the existing security technologies: new analysis techniques
should be scalable to handle “big data” scale codebase; They should be become smart
and proactive by using the data to understand what the vulnerable points are and where
they locate; effective protection will be provided for dissemination and analysis of the data
involving sensitive information on an unprecedented scale.
In this dissertation, I argue that the code search techniques can boost existing security
analysis techniques (vulnerability identification and memory analysis) in terms of scalability and accuracy. In order to demonstrate its benefits, I address two issues of code search by using the code analysis: scalability and accountability. I further demonstrate the benefit of code search by applying it for the scalable vulnerability identification [57] and the
cross-version memory analysis problems [55, 56].
Firstly, I address the scalability problem of code search by learning “higher-level” semantic
features from code [57]. Instead of conducting fine-grained testing on a single device
or program, it becomes much more crucial to achieve the quick vulnerability scanning
in devices or programs at a “big data” scale. However, discovering vulnerabilities in “big
code” is like finding a needle in the haystack, even when dealing with known vulnerabilities. This new challenge demands a scalable code search approach. To this end, I leverage successful techniques from the image search in computer vision community and propose a novel code encoding method for scalable vulnerability search in binary code. The evaluation results show that this approach can achieve comparable or even better accuracy and efficiency than the baseline techniques.
Secondly, I tackle the accountability issues left in the vulnerability searching problem
by designing vulnerability-oriented raw features [58]. The similar code does not always
represent the similar vulnerability, so it requires that the feature engineering for the code
search should focus on semantic level features rather than syntactic ones. I propose to
extract conditional formulas as higher-level semantic features from the raw binary code to
conduct the code search. A conditional formula explicitly captures two cardinal factors
of a vulnerability: 1) erroneous data dependencies and 2) missing or invalid condition
checks. As a result, the binary code search on conditional formulas produces significantly
higher accuracy and provides meaningful evidence for human analysts to further examine
the search results. The evaluation results show that this approach can further improve
the search accuracy of existing bug search techniques with very reasonable performance
overhead.
Finally, I demonstrate the potential of the code search technique in the memory analysis
field, and apply it to address their across-version issue in the memory forensic problem
[55, 56]. The memory analysis techniques for COTS software usually rely on the
so-called “data structure profiles” for their binaries. Construction of such profiles requires
the expert knowledge about the internal working of a specified software version. However,
it is still a cumbersome manual effort most of time. I propose to leverage the code search
technique to enable a notion named “cross-version memory analysis”, which can update a
profile for new versions of a software by transferring the knowledge from the model that
has already been trained on its old version. The evaluation results show that the code search based approach advances the existing memory analysis methods by reducing the
manual efforts while maintaining the reasonable accuracy. With the help of collaborators, I
further developed two plugins to the Volatility memory forensic framework [2], and show
that each of the two plugins can construct a localized profile to perform specified memory
forensic tasks on the same memory dump, without the need of manual effort in creating the corresponding profile
- …