7 research outputs found

    Pre-filters in-transit malware packets detection in the network

    Get PDF
    Conventional malware detection systems cannot detect most of the new malware in the network without the availability of their signatures. In order to solve this problem, this paper proposes a technique to detect both metamorphic (mutated malware) and general (non-mutated) malware in the network using a combination of known malware sub-signature and machine learning classification. This network-based malware detection is achieved through a middle path for efficient processing of non-malware packets. The proposed technique has been tested and verified using multiple data sets (metamorphic malware, non-mutated malware, and UTM real traffic), this technique can detect most of malware packets in the network-based before they reached the host better than the previous works which detect malware in host-based. Experimental results showed that the proposed technique can speed up the transmission of more than 98% normal packets without sending them to the slow path, and more than 97% of malware packets are detected and dropped in the middle path. Furthermore, more than 75% of metamorphic malware packets in the test dataset could be detected. The proposed technique is 37 times faster than existing technique

    Método para la detección de intrusos mediante redes neuronales basado en la reducción de características

    Get PDF
    La aplicación de técnicas basadas en Inteligencia Artificial para la detección de intrusos (IDS), fundamentalmente las redes neuronales artificiales (ANN), están demostrando ser un enfoque muy adecuado para paliar muchos de los problemas abiertos en esta área. Sin embargo, el gran volumen de información que se requiere cada día para entrenar estos sistemas, junto con la necesidad exponencial de tiempo que requieren para asimilarlos, dificulta enormemente su puesta en marcha en escenarios reales. En este trabajo se propone un método basado en la aplicación de una técnica para la reducción de características, denominada Análisis de Componentes Principales (PCA). El PCA permite obtener un modelo para la reducción del tamaño de los vectores de entrada a la ANN, asegurando que la pérdida de información sea mínima y, en consecuencia, disminuyendo la complejidad del clasificador neuronal y manteniendo estables los tiempos de entrenamiento. Para validar la propuesta se ha diseñado un escenario de prueba mediante un IDS basado en ANN. Los resultados obtenidos a partir de las pruebas realizadas demuestran la validez de la propuesta y acreditan las líneas futuras de trabajo.Este trabajo ha sido financiado por el Ministerio de Educación y Ciencia de España bajo el proyecto de investigación TIN2006-04081 y por la Generalitat Valenciana bajo el proyecto de investigación GV/2007175

    iGen: Toward Automatic Generation and Analysis of Indicators of Compromise (IOCs) using Convolutional Neural Network

    Get PDF
    abstract: Field of cyber threats is evolving rapidly and every day multitude of new information about malware and Advanced Persistent Threats (APTs) is generated in the form of malware reports, blog articles, forum posts, etc. However, current Threat Intelligence (TI) systems have several limitations. First, most of the TI systems examine and interpret data manually with the help of analysts. Second, some of them generate Indicators of Compromise (IOCs) directly using regular expressions without understanding the contextual meaning of those IOCs from the data sources which allows the tools to include lot of false positives. Third, lot of TI systems consider either one or two data sources for the generation of IOCs, and misses some of the most valuable IOCs from other data sources. To overcome these limitations, we propose iGen, a novel approach to fully automate the process of IOC generation and analysis. Proposed approach is based on the idea that our model can understand English texts like human beings, and extract the IOCs from the different data sources intelligently. Identification of the IOCs is done on the basis of the syntax and semantics of the sentence as well as context words (e.g., ``attacked'', ``suspicious'') present in the sentence which helps the approach work on any kind of data source. Our proposed technique, first removes the words with no contextual meaning like stop words and punctuations etc. Then using the rest of the words in the sentence and output label (IOC or non-IOC sentence), our model intelligently learn to classify sentences into IOC and non-IOC sentences. Once IOC sentences are identified using this learned Convolutional Neural Network (CNN) based approach, next step is to identify the IOC tokens (like domains, IP, URL) in the sentences. This CNN based classification model helps in removing false positives (like IPs which are not malicious). Afterwards, IOCs extracted from different data sources are correlated to find the links between thousands of apparently unrelated attack instances, particularly infrastructures shared between them. Our approach fully automates the process of IOC generation from gathering data from different sources to creating rules (e.g. OpenIOC, snort rules, STIX rules) for deployment on the security infrastructure. iGen has collected around 400K IOCs till now with a precision of 95\%, better than any state-of-art method.Dissertation/ThesisMasters Thesis Computer Science 201

    Dynamic Protocol Reverse Engineering a Grammatical Inference Approach

    Get PDF
    Round trip engineering of software from source code and reverse engineering of software from binary files have both been extensively studied and the state-of-practice have documented tools and techniques. Forward engineering of protocols has also been extensively studied and there are firmly established techniques for generating correct protocols. While observation of protocol behavior for performance testing has been studied and techniques established, reverse engineering of protocol control flow from observations of protocol behavior has not received the same level of attention. State-of-practice in reverse engineering the control flow of computer network protocols is comprised of mostly ad hoc approaches. We examine state-of-practice tools and techniques used in three open source projects: Pidgin, Samba, and rdesktop . We examine techniques proposed by computational learning researchers for grammatical inference. We propose to extend the state-of-art by inferring protocol control flow using grammatical inference inspired techniques to reverse engineer automata representations from captured data flows. We present evidence that grammatical inference is applicable to the problem domain under consideration

    Effizientes Maschinelles Lernen für die Angriffserkennung

    Get PDF
    Detecting and fending off attacks on computer systems is an enduring problem in computer security. In light of a plethora of different threats and the growing automation used by attackers, we are in urgent need of more advanced methods for attack detection. In this thesis, we address the necessity of advanced attack detection and develop methods to detect attacks using machine learning to establish a higher degree of automation for reactive security. Machine learning is data-driven and not void of bias. For the effective application of machine learning for attack detection, thus, a periodic retraining over time is crucial. However, the training complexity of many learning-based approaches is substantial. We show that with the right data representation, efficient algorithms for mining substring statistics, and implementations based on probabilistic data structures, training the underlying model can be achieved in linear time. In two different scenarios, we demonstrate the effectiveness of so-called language models that allow to generically portray the content and structure of attacks: On the one hand, we are learning malicious behavior of Flash-based malware using classification, and on the other hand, we detect intrusions by learning normality in industrial control networks using anomaly detection. With a data throughput of up to 580 Mbit/s during training, we do not only meet our expectations with respect to runtime but also outperform related approaches by up to an order of magnitude in detection performance. The same techniques that facilitate learning in the previous scenarios can also be used for revealing malicious content, embedded in passive file formats, such as Microsoft Office documents. As a further showcase, we additionally develop a method based on the efficient mining of substring statistics that is able to break obfuscations irrespective of the used key length, with up to 25 Mbit/s and thus, succeeds where related approaches fail. These methods significantly improve detection performance and enable operation in linear time. In doing so, we counteract the trend of compensating increasing runtime requirements with resources. While the results are promising and the approaches provide urgently needed automation, they cannot and are not intended to replace human experts or traditional approaches, but are designed to assist and complement them.Die Erkennung und Abwehr von Angriffen auf Endnutzer und Netzwerke ist seit vielen Jahren ein anhaltendes Problem in der Computersicherheit. Angesichts der hohen Anzahl an unterschiedlichen Angriffsvektoren und der zunehmenden Automatisierung von Angriffen, bedarf es dringend moderner Methoden zur Angriffserkennung. In dieser Doktorarbeit werden Ansätze entwickelt, um Angriffe mit Hilfe von Methoden des maschinellen Lernens zuverlässig, aber auch effizient zu erkennen. Sie stellen der Automatisierung von Angriffen einen entsprechend hohen Grad an Automatisierung von Verteidigungsmaßnahmen entgegen. Das Trainieren solcher Methoden ist allerdings rechnerisch aufwändig und erfolgt auf sehr großen Datenmengen. Laufzeiteffiziente Lernverfahren sind also entscheidend. Wir zeigen, dass durch den Einsatz von effizienten Algorithmen zur statistischen Analyse von Zeichenketten und Implementierung auf Basis von probabilistischen Datenstrukturen, das Lernen von effektiver Angriffserkennung auch in linearer Zeit möglich ist. Anhand von zwei unterschiedlichen Anwendungsfällen, demonstrieren wir die Effektivität von Modellen, die auf der Extraktion von sogenannten n-Grammen basieren: Zum einen, betrachten wir die Erkennung von Flash-basiertem Schadcode mittels Methoden der Klassifikation, und zum anderen, die Erkennung von Angriffen auf Industrienetzwerke bzw. SCADA-Systeme mit Hilfe von Anomaliedetektion. Dabei erzielen wir während des Trainings dieser Modelle einen Datendurchsatz von bis zu 580 Mbit/s und übertreffen gleichzeitig die Erkennungsleistung von anderen Ansätzen deutlich. Die selben Techniken, um diese lernenden Ansätze zu ermöglichen, können außerdem für die Erkennung von Schadcode verwendet werden, der in anderen Dateiformaten eingebettet und mittels einfacher Verschlüsselungen obfuskiert wurde. Hierzu entwickeln wir eine Methode die basierend auf der statistischen Auswertung von Zeichenketten einfache Verschlüsselungen bricht. Der entwickelte Ansatz arbeitet unabhängig von der verwendeten Schlüssellänge, mit einem Datendurchsatz von bis zu 25 Mbit/s und ermöglicht so die erfolgreiche Deobfuskierung in Fällen an denen andere Ansätze scheitern. Die erzielten Ergebnisse in Hinsicht auf Laufzeiteffizienz und Erkennungsleistung sind vielversprechend. Die vorgestellten Methoden ermöglichen die dringend nötige Automatisierung von Verteidigungsmaßnahmen, sollen den Experten oder etablierte Methoden aber nicht ersetzen, sondern diese unterstützen und ergänzen

    The foundations of lesion-function inference in the human brain

    Get PDF
    Understanding the functional architecture of the brain has long been a challenge in neuroscience with a variety of techniques having been developed to explore this structure-function relationship. However, in order to be able to accurately identify the underlying system we require techniques that have the capabilities of describing the complexities therein. In order to perform lesion-function studies a cohort of brain scans with the location of the lesion identified must be collected. Utilising diffusion weighted magnetic resonance imaging, normally collected in the clinical setting, I propose a new unsupervised lesion segmentation routine. The cohort of brain scans also need to be spatially normalised such that homologous regions of the brain are brought into register with each other. However, this process can be perturbed by the presence of a lesion within the scan. Though a series of simulations I evaluate the performance of 12 different spatial normalisation routines on brains scans that possess a lesion. Historically lesion-function mapping studies have tended to use a univariate statistical approach, where different locations within the brain are treated as being spatially independent from each other. Here I show that biases within the structure of the data have the potential to distort the lesion-function inferences we draw. Though a series of simulations, I show that a mass univariate technique is vulnerable to these biases and assess three different multivariate methods (Support Vector Machines, Relevance Vector Machines and Flexible Bayesian Modelling) as potential solutions to this problem. Asides from making lesion-function inferences, these multivariate models can be used to predict future events. Using a data set of paired admission diffusion weighted magnetic resonance imaging scans and functional outcome scores I apply these techniques to the clinical scenario of predicting the functional outcome of patients after a cerebral vascular event

    Detecting Unknown Network Attacks using Language Models

    No full text
    We propose a method for network intrusion detection based on language models such as n-grams and words. Our method proceeds by extracting these models from TCP connection payloads and applying unsupervised anomaly detection. The essential part of our approach is linear-time computation of similarity measures between language models stored in trie data structures. Results of our experiments conducted on two datasets of network traffic demonstrate the importance of higher-order n-grams for detection of unknown network attacks. Our method is also suitable for language models based on words, which are more amenable in practical security applications. An implementation of our system achieved detection accuracy of over 80 % with no false positives on instances of recent attacks in HTTP, FTP and SMTP traffic
    corecore