23 research outputs found
PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts
Public disclosure of important security information, such as knowledge of
vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and
other online sources months before proper classification into structured
databases. In order to facilitate timely discovery of such knowledge, we
propose a novel semi-supervised learning algorithm, PACE, for identifying and
classifying relevant entities in text sources. The main contribution of this
paper is an enhancement of the traditional bootstrapping method for entity
extraction by employing a time-memory trade-off that simultaneously circumvents
a costly corpus search while strengthening pattern nomination, which should
increase accuracy. An implementation in the cyber-security domain is discussed
as well as challenges to Natural Language Processing imposed by the security
domain.Comment: 6 pages, 3 figures, ieeeTran conference. International Conference on
Machine Learning and Applications 201
Developing and Deploying Security Applications for In-Vehicle Networks
Radiological material transportation is primarily facilitated by heavy-duty
on-road vehicles. Modern vehicles have dozens of electronic control units or
ECUs, which are small, embedded computers that communicate with sensors and
each other for vehicle functionality. ECUs use a standardized network
architecture--Controller Area Network or CAN--which presents grave security
concerns that have been exploited by researchers and hackers alike. For
instance, ECUs can be impersonated by adversaries who have infiltrated an
automotive CAN and disable or invoke unintended vehicle functions such as
brakes, acceleration, or safety mechanisms. Further, the quality of security
approaches varies wildly between manufacturers. Thus, research and development
of after-market security solutions have grown remarkably in recent years. Many
researchers are exploring deployable intrusion detection and prevention
mechanisms using machine learning and data science techniques. However, there
is a gap between developing security system algorithms and deploying prototype
security appliances in-vehicle. In this paper, we, a research team at Oak Ridge
National Laboratory working in this space, highlight challenges in the
development pipeline, and provide techniques to standardize methodology and
overcome technological hurdles.Comment: 10 pages, PATRAM 2
AI ATAC 1: An Evaluation of Prominent Commercial Malware Detectors
This work presents an evaluation of six prominent commercial endpoint malware
detectors, a network malware detector, and a file-conviction algorithm from a
cyber technology vendor. The evaluation was administered as the first of the
Artificial Intelligence Applications to Autonomous Cybersecurity (AI ATAC)
prize challenges, funded by / completed in service of the US Navy. The
experiment employed 100K files (50/50% benign/malicious) with a stratified
distribution of file types, including ~1K zero-day program executables
(increasing experiment size two orders of magnitude over previous work). We
present an evaluation process of delivering a file to a fresh virtual machine
donning the detection technology, waiting 90s to allow static detection, then
executing the file and waiting another period for dynamic detection; this
allows greater fidelity in the observational data than previous experiments, in
particular, resource and time-to-detection statistics. To execute all 800K
trials (100K files 8 tools), a software framework is designed to
choreographed the experiment into a completely automated, time-synced, and
reproducible workflow with substantial parallelization. A cost-benefit model
was configured to integrate the tools' recall, precision, time to detection,
and resource requirements into a single comparable quantity by simulating costs
of use. This provides a ranking methodology for cyber competitions and a lens
through which to reason about the varied statistical viewpoints of the results.
These statistical and cost-model results provide insights on state of
commercial malware detection
Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection
There is a lack of scientific testing of commercially available malware
detectors, especially those that boast accurate classification of
never-before-seen (i.e., zero-day) files using machine learning (ML). The
result is that the efficacy and gaps among the available approaches are opaque,
inhibiting end users from making informed network security decisions and
researchers from targeting gaps in current detectors. In this paper, we present
a scientific evaluation of four market-leading malware detection tools to
assist an organization with two primary questions: (Q1) To what extent do
ML-based tools accurately classify never-before-seen files without sacrificing
detection ability on known files? (Q2) Is it worth purchasing a network-level
malware detector to complement host-based detection? We tested each tool
against 3,536 total files (2,554 or 72% malicious, 982 or 28% benign) including
over 400 zero-day malware, and tested with a variety of file types and
protocols for delivery. We present statistical results on detection time and
accuracy, consider complementary analysis (using multiple tools together), and
provide two novel applications of a recent cost-benefit evaluation procedure by
Iannaconne & Bridges that incorporates all the above metrics into a single
quantifiable cost. While the ML-based tools are more effective at detecting
zero-day files and executables, the signature-based tool may still be an
overall better option. Both network-based tools provide substantial (simulated)
savings when paired with either host tool, yet both show poor detection rates
on protocols other than HTTP or SMTP. Our results show that all four tools have
near-perfect precision but alarmingly low recall, especially on file types
other than executables and office files -- 37% of malware tested, including all
polyglot files, were undetected.Comment: Includes Actionable Takeaways for SOC