105 research outputs found

    Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques

    Get PDF
    The problem of detecting malicious behavior in network traffic has become an extremely difficult challenge for the security community. Consequently, several intelligence-based tools have been proposed to generate models capable of understanding the information traveling through the network and to help in the identification of suspicious connections as soon as possible. However, the lack of high-quality datasets has been one of the main obstacles in the developing of reliable intelligence-based tools. A well-labeled dataset is fundamental not only for the process of automatically learning models but also for testing its performance. Recently, RiskID emerged with the goal of providing to the network security community a collaborative tool for helping the labeling process. Through the use of visual and statistical techniques, RiskID facilitates to the user the generation of labeled datasets from real connections. In this article, we present a machine learning extension for RiskID, to help the user in the malware identification process. A preliminary study shows that as the size of labeled data increases, the use of machine learning models can be a valuable tool during the labeling process of future traffic connections.VI Workshop de Seguridad Informática (WSI).Red de Universidades con Carreras en Informática (RedUNCI

    Towards efficient intrusion detection systems based on machine learning techniques

    Get PDF
    Intrusion Detection System (IDS) have been the key in the network manager daily fight against continuous attacks. However, with the Internet growth, network security issues have become more difficult to handle. Jointly, Machine Learning (ML) techniques for traffic classification have been successful in terms of performance classification. Unfortunately, most of these techniques are extremely CPU time consuming, making the whole approach unsuitable for real traffic situations. In this work, a description of a simple software architecture for ML based is presented together with the first steps towards improving algorithms efficience in some of the proposed modules. A set experiments on the 199 DARPA dataset are conducted in order to evaluate two atribute selecting algorithms considering not only classsification perfomance but also the required CPU time. Preliminary results show that computadtioal effort can be reduced by 50% maintaining similar accuaracy levels, progressing towards a real world implementation of an ML based IDS.Presentado en el V Workshop Arquitectura, Redes y Sistemas Operativos (WARSO)Red de Universidades con Carreras en Informática (RedUNCI

    An autonomous labeling approach to support vector machines algorithms for network traffic anomaly detection

    Get PDF
    In the past years, several support vector machines (SVM) novelty detection approaches have been applied on the network intrusion detection field. The main advantage of these approaches is that they can characterize normal traffic even when trained with datasets containing not only normal traffic but also a number of attacks. Unfortunately, these algorithms seem to be accurate only when the normal traffic vastly outnumbers the number of attacks present in the dataset. A situation which can not be always hold. This work presents an approach for autonomous labeling of normal traffic as a way of dealing with situations where class distribution does not present the imbalance required for SVM algorithms. In this case, the autonomous labeling process is made by SNORT, a misuse-based intrusion detection system. Experiments conducted on the 1998 DARPA dataset show that the use of the proposed autonomous labeling approach not only outperforms existing SVM alternatives but also, under some attack distributions, obtains improvements over SNORT itself.Fil: Catania, Carlos Adrian. Universidad Nacional de Cuyo; ArgentinaFil: Bromberg, Facundo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; Argentina. Universidad Tecnológica Nacional. Facultad Regional Mendoza. Departamento de Sistemas de Información. Laboratorio DHARMA; ArgentinaFil: Garcia Garino, Carlos Gabriel. Universidad Nacional de Cuyo. Facultad de Ingeniería; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; Argentin

    Behavior Classification of A Grazing Goat in the Argentine Monte Desert by Using Inertial Sensors

    Get PDF
    The knowledge generated by animal behavior studies has been gaining importance due to it can be used to improve the efficiency of animal production systems. In recent years, sensor-based approaches for animal behavior classification has emerged as a promising alternative for analyzing animals grazing patterns. In the present article it is proposed the use of a classification system based on inertial sensors for identifying a goat’s grazing behavior in the Argentine Monte Desert. The data acquisition system is based on commercial off-the-self devices. It is used to create a reliable dataset for performing the animal behavior predictions. By fixing the system on the head of a goat it was possible to log its movements when it was grazing in a natural pasture. A preliminary version of the dataset is evaluated using a classical statistical learning algorithm. Results show that goat activities can be predicted with an average precision value above 85% and a recall of 84%.Sociedad Argentina de Informática e Investigación Operativ

    Behavior Classification of A Grazing Goat in the Argentine Monte Desert by Using Inertial Sensors

    Get PDF
    The knowledge generated by animal behavior studies has been gaining importance due to it can be used to improve the efficiency of animal production systems. In recent years, sensor-based approaches for animal behavior classification has emerged as a promising alternative for analyzing animals grazing patterns. In the present article it is proposed the use of a classification system based on inertial sensors for identifying a goat’s grazing behavior in the Argentine Monte Desert. The data acquisition system is based on commercial off-the-self devices. It is used to create a reliable dataset for performing the animal behavior predictions. By fixing the system on the head of a goat it was possible to log its movements when it was grazing in a natural pasture. A preliminary version of the dataset is evaluated using a classical statistical learning algorithm. Results show that goat activities can be predicted with an average precision value above 85% and a recall of 84%.Sociedad Argentina de Informática e Investigación Operativ

    LLM in the Shell: Generative Honeypots

    Full text link
    Honeypots are essential tools in cybersecurity. However, most of them (even the high-interaction ones) lack the required realism to engage and fool human attackers. This limitation makes them easily discernible, hindering their effectiveness. This work introduces a novel method to create dynamic and realistic software honeypots based on Large Language Models. Preliminary results indicate that LLMs can create credible and dynamic honeypots capable of addressing important limitations of previous honeypots, such as deterministic responses, lack of adaptability, etc. We evaluated the realism of each command by conducting an experiment with human attackers who needed to say if the answer from the honeypot was fake or not. Our proposed honeypot, called shelLM, reached an accuracy rate of 0.92.Comment: 5 pages. 1 figure 1 tabl

    Towards Better Understanding of Cybercrime: The Role of Fine-Tuned LLMs in Translation

    Full text link
    Understanding cybercrime communications is paramount for cybersecurity defence. This often involves translating communications into English for processing, interpreting, and generating timely intelligence. The problem is that translation is hard. Human translation is slow, expensive, and scarce. Machine translation is inaccurate and biased. We propose using fine-tuned Large Language Models (LLM) to generate translations that can accurately capture the nuances of cybercrime language. We apply our technique to public chats from the NoName057(16) Russian-speaking hacktivist group. Our results show that our fine-tuned LLM model is better, faster, more accurate, and able to capture nuances of the language. Our method shows it is possible to achieve high-fidelity translations and significantly reduce costs by a factor ranging from 430 to 23,000 compared to a human translator.Comment: 9 pages, 4 figure

    An application of Deep Neural Networks for automatic detection of randomly generated Domain Names

    Get PDF
    En el contexto de la seguridad de redes de datos, un nombre de dominio generado de manera algorítmica (DGA, de sus siglas en inglés) es utilizado por el software malicioso (malware) para generar de manera dinámica un gran número de nombres de dominios de manera pseudo aleatoria, y luego utilizar un subconjunto de estos como parte del canal de Comando y Control (C&C). Este canal podrá luego ser utilizado para indicar, a las máquinas infectadas con el malware, diferentes acciones maliciosas como ser SPAM, campañas de Clicks, Denegación de servicio, etc. El presente proyecto propone el desarrollo de algoritmos de detección de DGA mediante la utilización de algoritmos de aprendizaje de máquinas en general y las redes neuronales profundas en particular. En los últimos 10 años la utilización de redes neuronales profundas ha sido la causa detrás de los mayores avances en el reconocimiento automático de imágenes, audio, video y análisis de texto. Se espera que la aplicación de redes neuronales profundas para el aprendizaje de los patrones comunes a los DGA permita desarrollar herramientas de detección no solo con una baja tasa de falsos positivos sino también con la capacidad de operar en tiempo real. Esto último resulta fundamental para lidiar con las amenazas de seguridad de hoy.A domain generation algorithm (DGA) is used to dynamically generate a large number of pseudo random domain names and then selecting a small subset of these domains for the Command Control (C&C) communication channel. The idea behind the dynamic nature of DGA was to avoid the inclusion of hard-coded domain names inside malware binaries, complicating the extraction of this information by reverse engineering. The C&C channel can be used for instructing the botnet to take different malicious actions such as SPAM, click campaign, DDOS, etc. The present project proposes the development of an algorithm for DGA detection based on machine learning algorithms. In particular, we propose the use of Deep Neural Networks. In the last 10 years, deep learning techniques has been the cause behind the major advances in the automatic recognition of images, audio, video and text. We expect the ability of deep neural networks for recognizing common patterns in DGA facilitates the development of a detection tool. A tool what will operate not only with a low false positive rate but also in real time. Both requirements are fundamental for dealing with today security threats

    Predicting Harbertson-Adams Assay Phenolic Parameters In Red Wines Using Visible Spectra

    Get PDF
    The Harbertson-Adams phenolic parameter assay is a well- known method to measure a panel of phenolic compounds in red wines. However, the multistep analyses required by the method fail at producing results on multiple parameters rapidly. In the present article, we analyze the bene ts of applying a statistical model based on Principal Component Analysis (PCA) and a statistical learning technique denoted as Support Vector Regression Machines (SVR) for correlating sample spectra data to the Harbertson-Adams assay, on each of the phenolics components. The resulting model showed a high correlation between the measured and predicted values for each of the phenolic parameters despite the multicollinearity and high dimensions of the dataset.Sociedad Argentina de Informática e Investigación Operativ

    LibreSense: software para análisis sensorial de alimentos

    Get PDF
    El tiempo empleado en la recolección de datos y su tratamiento estadístico constituye una seria limitante para el análisis sensorial de alimentos. Si bien hay diversos programas comerciales que realizan dicha labor de manera automática, los mismos tienen un alto costo por su elevada licencia anual. Aquellos organismos que no pueden afrontar dicho costo, normalmente trabajan con planillas físicas las cuales decodifican manualmente con el consecuente gasto en tiempo y recursos. LibreSense es una aplicación desarrollada en lenguaje R utilizando los paquetes Shiny y SensoMineR. Esta aplicación permite la captura de datos sensoriales y el análisis estadístico “in situ” tanto de los resultados de los distintos tratamientos como de la performance de los panelistas. La aplicación realiza la captura de los datos a través de cualquier dispositivo con conexión inalámbrica para luego realizar el procesamiento estadístico de los mismos. Para la evaluación de las distintas muestras, los panelistas se conectan a través de sus dispositivos a un servidor que corre con shiny server. Si bien LibreSense todavía se encuentra en un estado de desarrollo preliminar, actualmente constituye una herramienta indispensable para el panel sensorial de vinos de INTA EEAA Mendoza como así también varias organizaciones del medio han mostrado interés para su uso y adquisiciónSociedad Argentina de Informática e Investigación Operativ
    corecore