12 research outputs found
Tracking Cyber Adversaries with Adaptive Indicators of Compromise
A forensics investigation after a breach often uncovers network and host
indicators of compromise (IOCs) that can be deployed to sensors to allow early
detection of the adversary in the future. Over time, the adversary will change
tactics, techniques, and procedures (TTPs), which will also change the data
generated. If the IOCs are not kept up-to-date with the adversary's new TTPs,
the adversary will no longer be detected once all of the IOCs become invalid.
Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular
expressions (regexes), up-to-date with a dynamic adversary. Our framework
solves the TTK problem in an automated, cyclic fashion to bracket a previously
discovered adversary. This tracking is accomplished through a data-driven
approach of self-adapting a given model based on its own detection
capabilities.
In our initial experiments, we found that the true positive rate (TPR) of the
adaptive solution degrades much less significantly over time than the naive
solution, suggesting that self-updating the model allows the continued
detection of positives (i.e., adversaries). The cost for this performance is in
the false positive rate (FPR), which increases over time for the adaptive
solution, but remains constant for the naive solution. However, the difference
in overall detection performance, as measured by the area under the curve
(AUC), between the two methods is negligible. This result suggests that
self-updating the model over time should be done in practice to continue to
detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science &
Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas,
Nevada, US
Tracking Cyber Adversaries with Adaptive Indicators of Compromise
A forensics investigation after a breach often uncovers network and host
indicators of compromise (IOCs) that can be deployed to sensors to allow early
detection of the adversary in the future. Over time, the adversary will change
tactics, techniques, and procedures (TTPs), which will also change the data
generated. If the IOCs are not kept up-to-date with the adversary's new TTPs,
the adversary will no longer be detected once all of the IOCs become invalid.
Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular
expressions (regexes), up-to-date with a dynamic adversary. Our framework
solves the TTK problem in an automated, cyclic fashion to bracket a previously
discovered adversary. This tracking is accomplished through a data-driven
approach of self-adapting a given model based on its own detection
capabilities.
In our initial experiments, we found that the true positive rate (TPR) of the
adaptive solution degrades much less significantly over time than the naive
solution, suggesting that self-updating the model allows the continued
detection of positives (i.e., adversaries). The cost for this performance is in
the false positive rate (FPR), which increases over time for the adaptive
solution, but remains constant for the naive solution. However, the difference
in overall detection performance, as measured by the area under the curve
(AUC), between the two methods is negligible. This result suggests that
self-updating the model over time should be done in practice to continue to
detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science &
Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas,
Nevada, US
Symbolic Register Automata
Symbolic Finite Automata and Register Automata are two orthogonal extensions
of finite automata motivated by real-world problems where data may have
unbounded domains. These automata address a demand for a model over large or
infinite alphabets, respectively. Both automata models have interesting
applications and have been successful in their own right. In this paper, we
introduce Symbolic Register Automata, a new model that combines features from
both symbolic and register automata, with a view on applications that were
previously out of reach. We study their properties and provide algorithms for
emptiness, inclusion and equivalence checking, together with experimental
results
Techniques for efficient regular expression matching across hardware architectures
Regular expression matching is a central task for many networking and bioinformatics applications. For example, network intrusion detection systems, which perform deep packet inspection to detect malicious network activities, often encode signatures of malicious traffic through regular expressions. Similarly, several bioinformatics applications perform regular expression matching to find common patterns, called motifs, across multiple gene or protein sequences. Hardware implementations of regular expression matching engines fall into two categories: memory-based and logic-based solutions. In both cases, the design aims to maximize the processing throughput and minimize the resources requirements, either in terms of memory or of logic cells. Graphical Processing Units (GPUs) offer a highly parallel platform for memory-based implementations, while Field Programmable Gate Arrays (FPGAs) support reconfigurable, logic-based solutions. In addition, Micron Technology has recently announced its Automata Processor, a memory-based, reprogrammable hardware device. From an algorithmic standpoint, regular expression matching engines are based on finite automata, either in their non-deterministic or in their deterministic form (NFA and DFA, respectively). Micron's Automata Processor is based on a proprietary Automata Network, which extends classical NFA with counters and boolean elements. In this work, we aim to implement highly parallel memory-based and logic-based regular expression matching solutions. Our contributions are summarized as follows. First, we implemented regular expression matching on GPU. In this process, we explored compression techniques and regular expression clustering algorithms to alleviate the memory pressure of DFA-based GPU implementations. Second, we developed a parser for Automata Networks defined through Micron's Automata Network Markup Language (ANML), a XML-based high-level language designed to program the Automata Processor. Specifically, our ANML parser first maps the Automata Networks to an
Register Set Automata (Technical Report)
We present register set automata (RsAs), a register automaton model over data
words where registers can contain sets of data values and the following
operations are supported: adding values to registers, clearing registers, and
testing (non-)membership. We show that the emptiness problem for RsAs is
decidable and complete for the class. Moreover, we show that a large
class of register automata can be transformed into deterministic RsAs, which
can serve as a basis for (i) fast matching of a family of regular expressions
with back-references and (ii) language inclusion algorithm for a sub-class of
register automata. RsAs are incomparable in expressive power to other popular
automata models over data words, such as alternating register automata and
pebble automata
Evaluating Languages for Bioinformatics: Performance, Expressiveness and Energy
One of the fastest growing concerns in the technology sector is the increased demand for power in the world's data centers. Global data center electricity use in 2021 was estimated as between 220 and 320 terawatt-hours (TWh), as much as 1.3% of global electricity demand. As the data center industry continues to expand, so too will power usage, and therefore the need for increased energy efficiency in software development.
This thesis introduces a methodology that evaluates a set of programming languages based on three key metrics: performance, expressiveness, and energy use, demonstrating a fair consideration of each language's strengths and weaknesses. The framework presented creates a collection of string-matching algorithms used on DNA sequences to demonstrate the capabilities of each language, and draw out their distinctiveness.
DNA sequencing was chosen due to its growing uses and applications as technology evolves and makes such sequencing faster and less expensive. This in turn has lead to a growing percentage of compute-time being spent on this field. Using the methodology presented here it will be shown that using a newer language, like Rust, has advantages that help it balance speed, ease of use, and power consumption when used for advanced scientific computing.
A key part of this work introduces a novel approximate-matching algorithm to aid in this evaluation process. This new algorithm differs from current algorithms in use, in its ability to hold the gap between nucleotides to a specific maximum while allowing other gaps to exist. It will offer an alternative technique to other current approximate-matching algorithms and hopes to offer researchers another tool to consider for sequence-matching problems.
The expectation is that this research will show how testing and evaluating via performance, expressiveness and energy use metrics allows for rating and ranking programming languages in a consistent and reproducible manner. This will enable developers to make educated choices when selecting a language for a project. The methods described here will be applicable to other languages as well, given similar data to work with. This research will benefit the programming field by providing methods and techniques that can be used in the language selection process, particularly when energy efficiency is as important as overall performance
Hardware acceleration for power efficient deep packet inspection
The rapid growth of the Internet leads to a massive spread of malicious attacks like viruses and malwares, making the safety of online activity a major concern. The use of Network Intrusion Detection Systems (NIDS) is an effective method to safeguard the Internet. One key procedure in NIDS is Deep Packet Inspection (DPI). DPI can examine the contents of a packet and take actions on the packets based on predefined rules. In this thesis, DPI is mainly discussed in the context of security applications. However, DPI can also be used for bandwidth management and network surveillance.
DPI inspects the whole packet payload, and due to this and the complexity of the inspection rules, DPI algorithms consume significant amounts of resources including time, memory and energy. The aim of this thesis is to design hardware accelerated methods for memory and energy efficient high-speed DPI.
The patterns in packet payloads, especially complex patterns, can be efficiently represented by regular expressions, which can be translated by the use of Deterministic Finite Automata (DFA). DFA algorithms are fast but consume very large amounts of memory with certain kinds of regular expressions. In this thesis, memory efficient algorithms are proposed based on the transition compressions of the DFAs.
In this work, Bloom filters are used to implement DPI on an FPGA for hardware acceleration with the design of a parallel architecture. Furthermore, devoted at a balance of power and performance, an energy efficient adaptive Bloom filter is designed with the capability of adjusting the number of active hash functions according to current workload. In addition, a method is given for implementation on both two-stage and multi-stage platforms. Nevertheless, false positive rates still prevents the Bloom filter from extensive utilization; a cache-based counting Bloom filter is presented in this work to get rid of the false positives for fast and precise matching.
Finally, in future work, in order to estimate the effect of power savings, models will be built for routers and DPI, which will also analyze the latency impact of dynamic frequency adaption to current traffic. Besides, a low power DPI system will be designed with a single or multiple DPI engines. Results and evaluation of the low power DPI model and system will be produced in future
Recommended from our members
A novel intrusion detection system (IDS) architecture. Attack detection based on snort for multistage attack scenarios in a multi-cores environment.
Recent research has indicated that although security systems are developing,
illegal intrusion to computers is on the rise. The research conducted here
illustrates that improving intrusion detection and prevention methods is
fundamental for improving the overall security of systems.
This research includes the design of a novel Intrusion Detection System (IDS)
which identifies four levels of visibility of attacks. Two major areas of security
concern were identified: speed and volume of attacks; and complexity of
multistage attacks. Hence, the Multistage Intrusion Detection and Prevention
System (MIDaPS) that is designed here is made of two fundamental elements:
a multistage attack engine that heavily depends on attack trees and a Denial of
Service Engine. MIDaPS were tested and found to improve current intrusion
detection and processing performances.
After an intensive literature review, over 25 GB of data was collected on
honeynets. This was then used to analyse the complexity of attacks in a series
of experiments. Statistical and analytic methods were used to design the novel
MIDaPS.
Key findings indicate that an attack needs to be protected at 4 different levels.
Hence, MIDaPS is built with 4 levels of protection. As, recent attack vectors use
legitimate actions, MIDaPS uses a novel approach of attack trees to trace the
attacker¿s actions. MIDaPS was tested and results suggest an improvement to
current system performance by 84% whilst detecting DDOS attacks within 10
minutes