64 research outputs found

    Detection and Classification of Malicious Processes Using System Call Analysis

    Get PDF
    Despite efforts to mitigate the malware threat, the proliferation of malware continues, with record-setting numbers of malware samples being discovered each quarter. Malware are any intentionally malicious software, including software designed for extortion, sabotage, and espionage. Traditional malware defenses are primarily signature-based and heuristic-based, and include firewalls, intrusion detection systems, and antivirus software. Such defenses are reactive, performing well against known threats but struggling against new malware variants and zero-day threats. Together, the reactive nature of traditional defenses and the continuing spread of malware motivate the development of new techniques to detect such threats. One promising set of techniques uses features extracted from system call traces to infer malicious behaviors. This thesis studies the problem of detecting and classifying malicious processes using system call trace analysis. The goal of this study is to identify techniques that are `lightweight' enough and exhibit a low enough false positive rate to be deployed in production environments. The major contributions of this work are (1) a study of the effects of feature extraction strategy on malware detection performance; (2) the comparison of signature-based and statistical analysis techniques for malware detection and classification; (3) the use of sequential detection techniques to identify malicious behaviors as quickly as possible; (4) a study of malware detection performance at very low false positive rates; and (5) an extensive empirical evaluation, wherein the performance of the malware detection and classification systems are evaluated against data collected from production hosts and from the execution of recently discovered malware samples. The outcome of this study is a proof-of-concept system that detects the execution of malicious processes in production environments and classifies them according to their similarity to known malware.Ph.D., Electrical Engineering -- Drexel University, 201

    Mining structural and behavioral patterns in smart malware

    Get PDF
    Mención Internacional en el título de doctorFuncas. Premio Enrique Fuentes Quintana 2016.Smart devices equipped with powerful sensing, computing and networking capabilities have proliferated lately, ranging from popular smartphones and tablets to Internet appliances, smart TVs, and others that will soon appear (e.g., watches, glasses, and clothes). One key feature of such devices is their ability to incorporate third-party apps from a variety of markets. This poses strong security and privacy issues to users and infrastructure operators, particularly through software of malicious (or dubious) nature that can easily get access to the services provided by the device and collect sensory data and personal information. Malware in current smart devices—mostly smartphones and tablets—has rocketed in the last few years, supported by sophisticated techniques (e.g., advanced obfuscation and targeted infection and activation engines) purposely designed to overcome security architectures currently in use by such devices. This phenomenon is known as the proliferation of smart malware. Even though important advances have been made on malware analysis and detection in traditional personal computers during the last decades, adopting and adapting those techniques to smart devices is a challenging problem. For example, power consumption is one major constraint that makes unaffordable to run traditional detection engines on the device, while externalized (i.e., cloud-based) techniques raise many privacy concerns. This Thesis examines the problem of smart malware in such devices, aiming at designing and developing new approaches to assist security analysts and end users in the analysis of the security nature of apps. We first present a comprehensive analysis on how malware has evolved over the last years, as well as recent progress made to analyze and detect malware. Additionally, we compile a suit of the most cutting-edge open source tools, and we design a versatile and multipurpose research laboratory for smart malware analysis and detection. Second, we propose a number of methods and techniques aiming at better analyzing smart malware in scenarios with a constant and large stream of apps that require security inspection. More precisely, we introduce Dendroid, an effective system based on text mining and information retrieval techniques. Dendroid uses static analysis to measures the similarity between malware samples, which is then used to automatically classify them into families with remarkably accuracy. Then, we present Alterdroid, a novel dynamic analysis technique for automatically detecting hidden or obfuscated malware functionality. Alterdroid introduces the notion of differential fault analysis for effectively mining obfuscated malware components distributed as parts of an app package. Next, we present an evaluation of the power-consumption trade-offs among different strategies for off-loading, or not, certain security tasks to the cloud. We develop a system for testing several functional tasks and metering their power consumption called Meterdroid. Based on the results obtained in this analysis, we then propose a cloud-based system, called Targetdroid, that addresses the problem of detecting targeted malware by relying on stochastic models of usage and context events derived from real user traces. Based on these models, we build an efficient automatic testing system capable of triggering targeted malware. Finally, based on the conclusions extracted from this Thesis, we propose a number of open research problems and future directions where there is room for researchLos dispositivos inteligentes se han posicionado en pocos años como aparatos altamente populares con grandes capacidades de cómputo, comunicación y sensorización. Entre ellos se encuentran dispositivos como los teléfonos móviles inteligentes (o smartphones), las televisiones inteligentes, o más recientemente, los relojes, las gafas y la ropa inteligente. Una característica clave de este tipo de dispositivos es su capacidad para incorporar aplicaciones de terceros desde una gran variedad de mercados. Esto plantea fuertes problemas de seguridad y privacidad para sus usuarios y para los operadores de infraestructuras, sobre todo a través de software de naturaleza maliciosa (o malware), el cual es capaz de acceder fácilmente a los servicios proporcionados por el dispositivo y recoger datos sensibles de los sensores e información personal. En los últimos años se ha observado un incremento radical del malware atacando a estos dispositivos inteligentes—principalmente a smartphones—y apoyado por sofisticadas técnicas diseñadas para vencer los sistemas de seguridad implantados por los dispositivos. Este fenómeno ha dado pie a la proliferación de malware inteligente. Algunos ejemplos de estas técnicas inteligentes son el uso de métodos de ofuscación, de estrategias de infección dirigidas y de motores de activación basados en el contexto. A pesar de que en las últimos décadas se han realizado avances importantes en el análisis y la detección de malware en los ordenadores personales, adaptar y portar estas técnicas a los dispositivos inteligentes es un problema difícil de resolver. En concreto, el consumo de energía es una de las principales limitaciones a las que están expuestos estos dispositivos. Dicha limitación hace inasequible el uso de motores tradicionales de detección. Por el contrario, el uso de estrategias de detección externalizadas (es decir, basadas en la nube) suponen una gran amenaza para la privacidad de sus usuarios. Esta tesis analiza el problema del malware inteligente que adolece a estos dispositivos, con el objetivo de diseñar y desarrollar nuevos enfoques que permitan ayudar a los analistas de seguridad y los usuarios finales en la tarea de analizar aplicaciones. En primer lugar, se presenta un análisis exhaustivo sobre la evolución que el malware ha seguido en los últimos años, así como los avances más recientes enfocados a analizar apps y detectar malware. Además, integramos y extendemos las herramientas de código abierto más avanzadas utilizadas por la comunidad, y diseñamos un laboratorio que permite analizar malware inteligente de forma versátil y polivalente. En segundo lugar, se proponen una serie de técnicas dirigida a mejorar el análisis de malware inteligente en escenarios dónde se requiere analizar importantes cantidad de muestras. En concreto, se propone Dendroid, un sistema basado en minería de textos que permite analizar conjuntos de apps de forma eficaz. Dendroid hace uso de análisis estático de código para extraer una medida de la similitud entre distintas las muestras de malware. Dicha distancia permitirá posteriormente clasificar cada muestra en su correspondiente familia de malware de forma automática y con gran precisión. Por otro lado, se propone una técnica de análisis dinámico de código, llamada Alterdroid, que permite detectar automáticamente funcionalidad oculta y/o ofuscada. Alterdroid introduce la un nuevo método de análisis basado en la inyección de fallos y el análisis diferencial del comportamiento asociado. Por último, presentamos una evaluación del consumo energético asociado a diferentes estrategias de externalización usadas para trasladar a la nube determinadas tareas de seguridad. Para ello, desarrollamos un sistema llamado Meterdroid que permite probar distintas funcionalidades y medir su consumo. Basados en los resultados de este análisis, proponemos un sistema llamado Targetdroid que hace uso de la nube para abordar el problema de la detección de malware dirigido o especializado. Dicho sistema hace uso de modelos estocásticos para modelar el comportamiento del usuario así como el contexto que les rodea. De esta forma, Targetdroid permite, además, detectar de forma automática malware dirigido por medio de estos modelos. Para finalizar, a partir de las conclusiones extraídas en esta Tesis, identificamos una serie de líneas de investigación abiertas y trabajos futuros basados.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Francisco Javier López Muñoz.- Secretario: Jesús García Herrero.- Vocal: Nadarajah Asoka

    A novel approach for multimodal graph dimensionality reduction

    No full text
    This thesis deals with the problem of multimodal dimensionality reduction (DR), which arises when the input objects, to be mapped on a low-dimensional space, consist of multiple vectorial representations, instead of a single one. Herein, the problem is addressed in two alternative manners. One is based on the traditional notion of modality fusion, but using a novel approach to determine the fusion weights. In order to optimally fuse the modalities, the known graph embedding DR framework is extended to multiple modalities by considering a weighted sum of the involved affinity matrices. The weights of the sum are automatically calculated by minimizing an introduced notion of inconsistency of the resulting multimodal affinity matrix. The other manner for dealing with the problem is an approach to consider all modalities simultaneously, without fusing them, which has the advantage of minimal information loss due to fusion. In order to avoid fusion, the problem is viewed as a multi-objective optimization problem. The multiple objective functions are defined based on graph representations of the data, so that their individual minimization leads to dimensionality reduction for each modality separately. The aim is to combine the multiple modalities without the need to assign importance weights to them, or at least postpone such an assignment as a last step. The proposed approaches were experimentally tested in mapping multimedia data on low-dimensional spaces for purposes of visualization, classification and clustering. The no-fusion approach, namely Multi-objective DR, was able to discover mappings revealing the structure of all modalities simultaneously, which cannot be discovered by weight-based fusion methods. However, it results in a set of optimal trade-offs, from which one needs to be selected, which is not trivial. The optimal-fusion approach, namely Multimodal Graph Embedding DR, is able to easily extend unimodal DR methods to multiple modalities, but depends on the limitations of the unimodal DR method used. Both the no-fusion and the optimal-fusion approaches were compared to state-of-the-art multimodal dimensionality reduction methods and the comparison showed performance improvement in visualization, classification and clustering tasks. The proposed approaches were also evaluated for different types of problems and data, in two diverse application fields, a visual-accessibility-enhanced search engine and a visualization tool for mobile network security data. The results verified their applicability in different domains and suggested promising directions for future advancements.Open Acces

    Feature Space Augmentation: Improving Prediction Accuracy of Classical Problems in Cognitive Science and Computer Vison

    Get PDF
    The prediction accuracy in many classical problems across multiple domains has seen a rise since computational tools such as multi-layer neural nets and complex machine learning algorithms have become widely accessible to the research community. In this research, we take a step back and examine the feature space in two problems from very different domains. We show that novel augmentation to the feature space yields higher performance. Emotion Recognition in Adults from a Control Group: The objective is to quantify the emotional state of an individual at any time using data collected by wearable sensors. We define emotional state as a mixture of amusement, anger, disgust, fear, sadness, anxiety and neutral and their respective levels at any time. The generated model predicts an individual’s dominant state and generates an emotional spectrum, 1x7 vector indicating levels of each emotional state and anxiety. We present an iterative learning framework that alters the feature space uniquely to an individual’s emotion perception, and predicts the emotional state using the individual specific feature space. Hybrid Feature Space for Image Classification: The objective is to improve the accuracy of existing image recognition by leveraging text features from the images. As humans, we perceive objects using colors, dimensions, geometry and any textual information we can gather. Current image recognition algorithms rely exclusively on the first 3 and do not use the textual information. This study develops and tests an approach that trains a classifier on a hybrid text based feature space that has comparable accuracy to the state of the art CNN’s while being significantly inexpensive computationally. Moreover, when combined with CNN’S the approach yields a statistically significant boost in accuracy. Both models are validated using cross validation and holdout validation, and are evaluated against the state of the art

    Challenges and Open Questions of Machine Learning in Computer Security

    Get PDF
    This habilitation thesis presents advancements in machine learning for computer security, arising from problems in network intrusion detection and steganography. The thesis put an emphasis on explanation of traits shared by steganalysis, network intrusion detection, and other security domains, which makes these domains different from computer vision, speech recognition, and other fields where machine learning is typically studied. Then, the thesis presents methods developed to at least partially solve the identified problems with an overall goal to make machine learning based intrusion detection system viable. Most of them are general in the sense that they can be used outside intrusion detection and steganalysis on problems with similar constraints. A common feature of all methods is that they are generally simple, yet surprisingly effective. According to large-scale experiments they almost always improve the prior art, which is likely caused by being tailored to security problems and designed for large volumes of data. Specifically, the thesis addresses following problems: anomaly detection with low computational and memory complexity such that efficient processing of large data is possible; multiple-instance anomaly detection improving signal-to-noise ration by classifying larger group of samples; supervised classification of tree-structured data simplifying their encoding in neural networks; clustering of structured data; supervised training with the emphasis on the precision in top p% of returned data; and finally explanation of anomalies to help humans understand the nature of anomaly and speed-up their decision. Many algorithms and method presented in this thesis are deployed in the real intrusion detection system protecting millions of computers around the globe

    Applying Machine Learning to Advance Cyber Security: Network Based Intrusion Detection Systems

    Get PDF
    Many new devices, such as phones and tablets as well as traditional computer systems, rely on wireless connections to the Internet and are susceptible to attacks. Two important types of attacks are the use of malware and exploiting Internet protocol vulnerabilities in devices and network systems. These attacks form a threat on many levels and therefore any approach to dealing with these nefarious attacks will take several methods to counter. In this research, we utilize machine learning to detect and classify malware, visualize, detect and classify worms, as well as detect deauthentication attacks, a form of Denial of Service (DoS). This work also includes two prevention mechanisms for DoS attacks, namely a one- time password (OTP) and through the use of machine learning. Furthermore, we focus on an exploit of the widely used IEEE 802.11 protocol for wireless local area networks (WLANs). The work proposed here presents a threefold approach for intrusion detection to remedy the effects of malware and an Internet protocol exploit employing machine learning as a primary tool. We conclude with a comparison of dimensionality reduction methods to a deep learning classifier to demonstrate the effectiveness of these methods without compromising the accuracy of classification

    Data-driven Approach to Information Sharing using Data Fusion and Machine Learning

    Get PDF
    The number of security incidents worldwide is increasing, and the capabilities to detect and react is of uttermost importance. Intrusion Detection Systems (IDSs) are employed in various locations in networks to identify malicious activity. These sensors produce large amounts of data, which are fused and reduced. It is necessary to determine how to perform such fusion and reduction of data from heterogeneous sources. IDS is known to produce a high amount of false positives which create a high workload for human analysts at Security Operation Center (SOC). To ensure scalability, systems for reducing and streamlining the detection process is critical. The application of Threat Intelligence (TI) in information security for detection and prevention is widespread. When performing sharing of TI, it must be ensured that the data is reliable and trustworthy. Further, it must be guaranteed that the sharing process leaks sensitive data. This thesis has proposed a process model describing the process of fusion and reduction of heterogeneous sensor data and TI in intrusion detection. Our work is based on a literature study and qualitative research interviews with security experts from law enforcement and public and private organisations. Further, an identification of reliable and trustworthy features in such fused and reduced data for use in Machine Learning (ML) is given. We have applied data-driven methods on a real-world dataset from a SOC for this identification, and evaluate our results using well-known performance measure. Our results show that the application of ML can be used for prediction and decision support in the operation of SOC. We also provide an identification of sensitive features from the features selected by our data-driven experiments.Antall sikkerhetshendelser i verden øker, og mulighetene for deteksjon og reak- sjon er kritisk. Intrusion Detection System (IDS)er blir plassert i forskjellige lokasjoner i nettverk og systemer for å kunne identifisere ondsinnet aktivitet. Disse sensorene produserer store mengder data som må bli fusjonert og redusert. Det er derfor viktig å definere hvordan slik datafusjonering og -reduksjon skal gjøres når man har et stort antall heterogene sensorer. Det er kjent at IDSer pro- duserer store mengder falske positiver, som igjen skaper store mengder unød- vendig arbeid for sikkerhetsanalytikere i en Security Operation Center (SOC). For å tilrettelegge skalering er det kritisk med systemer som kan reduserer og effektivisere deteksjonsprosessen. Bruken av trusseletteretning for deteksjon og prevensjon i informasjonssikkerhetsmiljøet er utbredt. Når trusseletteretning blir delt, er det sentralt at den delte informasjonen er pålitelig, og at man unngår å dele sensitiv informasjon. Denne oppgaven foreslår en prosessmodel som beskriver fusjonering og reduksjon av data fra heterogene sensorer og trusseletteretningskilder. Vårt arbeid er basert på en litteraturstudie kombinert med kvalitative forskn- ingsintervjuer med sikkerhetseksperter fra politimyndigheter og offentlige og private organisasjoner. Videre så har vi identifisert attributer i slik fusjonert og redusert data som kan brukes i maskinlæring. Dette ble gjort via en datadrevet fremgangsmåte på et datasett fra en SOC med data fra den virkelige verden. Videre så ble resultatene våre evaluert med kjente metoder for ytelsesmåling. Våre resultater viser at bruken av maskinlæring for prediksjon og beslutningsstøtte i daglig operasjon av en SOC er mulig. Videre så har vi identifisert sensitive at- tributer fra attributene valgt av våre datadrevne eksperimenter

    Reducing Payment-Card Fraud

    Get PDF
    Critical public data in the United States are vulnerable to theft, creating severe financial and legal implications for payment-card acceptors. When security analysts and managers who work for payment card processing organizations implement strategies to reduce or eliminate payment-card fraud, they protect their organizations, consumers, and the local and national economy. Grounded in Cressey’s fraud theory, the purpose of this qualitative single case study was to explore strategies business owners and card processors use to reduce or eliminate payment-card fraud. The participants were 3 data security analysts and 1 manager working for an international payment card processing organization with 10 years or more experience working with payment card fraud detection in the southeastern United States. The data collection process was face-to-face semistructured interviews and review of company documentation. Within-case analysis, pattern matching, and methodological triangulation were used to identify 4 themes. The key themes related to artificial intelligence, cardholder and acceptor education, enhanced security strategies, and Payment Card Industry Data Security Standard (PCI-DSS) rules and regulations to reduce or end card fraud. The key recommendations are enforcement of stricter PCI-DSS rules and regulations for accepting payment cards at the acceptor and processor levels to reduce the potential for fraud through the use of holograms and card reader clearance between customers. The implications for social change include the potential to reduce costs to consumers, reduce overhead costs for businesses, and provide price reductions for consumers. Additionally, consumers may gain a sense of security when using their payment-card for purchases
    corecore