11 research outputs found
On the Detection Capabilities of Signature-Based Intrusion Detection Systems in the Context of Web Attacks
This work has been partly funded by the research grant PID2020-115199RB-I00 provided by the Spanish ministry of Industry under the contract MICIN/AEI/10.13039/501100011033, and also by FEDER/Junta de Andalucia-Consejeria de Transformacion Economica, Industria, Conocimiento y Universidades under project PYC20-RE-087-USE.Signature-based Intrusion Detection Systems (SIDS) play a crucial role within the arsenal
of security components of most organizations. They can find traces of known attacks in the network
traffic or host events for which patterns or signatures have been pre-established. SIDS include
standard packages of detection rulesets, but only those rules suited to the operational environment
should be activated for optimal performance. However, some organizations might skip this tuning
process and instead activate default off-the-shelf rulesets without understanding its implications and
trade-offs. In this work, we help gain insight into the consequences of using predefined rulesets in the
performance of SIDS. We experimentally explore the performance of three SIDS in the context of web
attacks. In particular, we gauge the detection rate obtained with predefined subsets of rules for Snort,
ModSecurity and Nemesida using seven attack datasets. We also determine the precision and rate of
alert generated by each detector in a real-life case using a large trace from a public webserver. Results
show that the maximum detection rate achieved by the SIDS under test is insufficient to protect
systems effectively and is lower than expected for known attacks. Our results also indicate that the
choice of predefined settings activated on each detector strongly influences its detection capability
and false alarm rate. Snort and ModSecurity scored either a very poor detection rate (activating
the less-sensitive predefined ruleset) or a very poor precision (activating the full ruleset). We also
found that using various SIDS for a cooperative decision can improve the precision or the detection
rate, but not both. Consequently, it is necessary to reflect upon the role of these open-source SIDS
with default configurations as core elements for protection in the context of web attacks. Finally, we
provide an efficient method for systematically determining which rules deactivate from a ruleset to
significantly reduce the false alarm rate for a target operational environment. We tested our approach
using Snort’s ruleset in our real-life trace, increasing the precision from 0.015 to 1 in less than 16 h
of work.Spanish Government PID2020-115199RB-I00
MICIN/AEI/10.13039/501100011033FEDER/Junta de Andalucia-Consejeria de Transformacion Economica, Industria, Conocimiento y Universidades PYC20-RE-087-US
How Much Training Data Is Enough? A Case Study for HTTP Anomaly-Based Intrusion Detection
Most anomaly-based intrusion detectors rely on models that learn from training datasets whose
quality is crucial in their performance. Albeit the properties of suitable datasets have been formulated,
the influence of the dataset size on the performance of the anomaly-based detector has received scarce
attention so far. In this work, we investigate the optimal size of a training dataset. This size should be
large enough so that training data is representative of normal behavior, but after that point, collecting more
data may result in unnecessary waste of time and computational resources, not to mention an increased
risk of overtraining. In this spirit, we provide a method to find out when the amount of data collected at
the production environment is representative of normal behavior in the context of a detector of HTTP URI
attacks based on 1-grammar. Our approach is founded on a set of indicators related to the statistical properties
of the data. These indicators are periodically calculated during data collection, producing time series that
stabilize when more training data is not expected to translate to better system performance, which indicates
that data collection can be stopped.We present a case study with real-life datasets collected at the University
of Seville (Spain) and a public dataset from the University of Saskatchewan. The application of our method
to these datasets showed that more than 42% of one trace, and almost 20% of another were unnecessarily
collected, thereby showing that our proposed method can be an efficient approach for collecting training
data at the production environment.This work was supported in part by the Corporación Tecnológica de AndalucÃa and the University of Seville through the Projects under
Grant CTA 1669/22/2017, Grant PI-1786/22/2018, and Grant PI-1736/22/2017
Fusing Information from Tickets and Alerts to Improve the Incident Resolution Process
In the context of network incident monitoring, alerts are useful notifications
that provide IT management staff with information about incidents. They are
usually triggered in an automatic manner by network equipment and monitoring systems, thus containing only technical information available to the systems
that are generating them. On the other hand, ticketing systems play a different
role in this context. Tickets represent the business point of view of incidents.
They are usually generated by human intervention and contain enriched semantic information about ongoing and past incidents. In this article, our main
hypothesis is that incorporating tickets information into the alert correlation
process will be beneficial to the incident resolution life-cycle in terms of accuracy, timing, and overall incident’s description. We propose a methodology to
validate this hypothesis and suggest a solution to the main challenges that appear. The proposed correlation approach is based on the time alignment of the
events (alerts and tickets) that affect common elements in the network. For this
we use real alert and ticket datasets obtained from a large telecommunications
network. The results have shown that using ticket information enhances the
incident resolution process, mainly by reducing and aggregating a higher percentage of alerts compared with standard alert correlation systems that only use
alerts as the main source of information. Finally, we also show the applicability
and usability of this model by applying it to a case study where we analyze the
performance of the management staff
Smart home anomaly-based IDS: Architecture proposal and case study
The complexity and diversity of the technologies involved in the Internet of Things (IoT)
challenge the generalization of security solutions based on anomaly detection, which should
fit the particularities of each context and deployment and allow for performance comparison.
In this work, we provide a flexible architecture based on building blocks suited for detecting
anomalies in the network traffic and the application-layer data exchanged by IoT devices in
the context of Smart Home. Following this architecture, we have defined a particular Intrusion
Detector System (IDS) for a case study that uses a public dataset with the electrical consumption
of 21 home devices over one year. In particular, we have defined ten Indicators of Compromise
(IoC) to detect network attacks and two anomaly detectors to detect false command or data
injection attacks. We have also included a signature-based IDS (Snort) to extend the detection
range to known attacks. We have reproduced eight network attacks (e.g., DoS, scanning) and
four False Command or Data Injection attacks to test our IDS performance. The results show that
all attacks were successfully detected by our IoCs and anomaly detectors with a false positive
rate lower than 0.3%. Signature detection was able to detect only 4 out of 12 attacks. Our
architecture and the IDS developed can be a reference for developing future IDS suited to
different contexts or use cases. Given that we use a public dataset, our contribution can also
serve as a baseline for comparison with new techniques that improve detection performanc
Validación de un sistema de diálogo mediante el uso de diferentes umbrales de poda en el proceso de reconocimiento automático de voz
Este artÃculo presenta una nueva metodologÃa cuyo objetivo es validar el
funcionamiento de los sistemas de diálogo centrándose en dos aspectos: tiempo de respuesta y
porcentaje de comprensión de frases. En primer lugar, el artÃculo realiza una descripción de la
interfaz de entrada del sistema de diálogo usado en los experimentos, incluyendo una
clasificación de las tareas de reconocimiento consideradas. A continuación, presenta los
fundamentos de la técnica propuesta y muestra una aplicación de la misma para validar el
funcionamiento del sistema de diálogo. Seguidamente muestra los resultados experimentales,
los cuales indican, por una parte, que seis de las nueve tareas de reconocimiento diseñadas
pueden ser consideradas válidas, pues se cumplen los requisitos impuestos respecto a tiempo de
reconocimiento y porcentaje de comprensión de frases. Por otra parte, los resultados indican que
para validar el funcionamiento del sistema es necesario cambiar las estrategias empleadas en las
tres tareas restantes. Finalmente, el artÃculo muestra algunas lÃneas de trabajo futuro,
encaminadas a utilizar nuevas estrategias para mejorar el funcionamiento del sistema de
diálogo.This paper presents a new technique to validate the performance of dialogue systems
focusing on two measures: response time a sentence understanding. Initially, the paper presents
a description of the input interface of the dialogue system used in the experiments, including a
classification of the recognition tasks considered. Later, it presents the basic features of the
proposed technique and describes how the technique has been applied to validate the
performance of the dialogue system. Later the paper presents the experimental results which, on
the one hand, show that six out of the nine recognition tasks employed by the system can be
considered validated, since the imposed restrictions on recognition time and sentence
understanding are kept. On the other hand, the results show that for improving the system it is
necessary to change the strategies used for the remainder three tasks. Finally, the paper shows
some possibilities for future work related to the new strategies employable to enhance the
performance of the dialogue system
Validación de un sistema de diálogo mediante el uso de diferentes umbrales de poda en el proceso de reconocimiento automático de voz
This paper presents a new technique to validate the performance of dialogue systems focusing on two measures: response time a sentence understanding. Initially, the paper presents a description of the input interface of the dialogue system used in the experiments, including a classification of the recognition tasks considered. Later, it presents the basic features of the proposed technique and describes how the technique has been applied to validate the performance of the dialogue system. Later the paper presents the experimental results which, on the one hand, show that six out of the nine recognition tasks employed by the system can be considered validated, since the imposed restrictions on recognition time and sentence understanding are kept. On the other hand, the results show that for improving the system it is necessary to change the strategies used for the remainder three tasks. Finally, the paper shows some possibilities for future work related to the new strategies employable to enhance the performance of the dialogue system.Este artÃculo presenta una nueva metodologÃa cuyo objetivo es validar el funcionamiento de los sistemas de diálogo centrándose en dos aspectos: tiempo de respuesta y porcentaje de comprensión de frases. En primer lugar, el artÃculo realiza una descripción de la interfaz de entrada del sistema de diálogo usado en los experimentos, incluyendo una clasificación de las tareas de reconocimiento consideradas. A continuación, presenta los fundamentos de la técnica propuesta y muestra una aplicación de la misma para validar el funcionamiento del sistema de diálogo. Seguidamente muestra los resultados experimentales, los cuales indican, por una parte, que seis de las nueve tareas de reconocimiento diseñadas pueden ser consideradas válidas, pues se cumplen los requisitos impuestos respecto a tiempo de reconocimiento y porcentaje de comprensión de frases. Por otra parte, los resultados indican que para validar el funcionamiento del sistema es necesario cambiar las estrategias empleadas en las tres tareas restantes. Finalmente, el artÃculo muestra algunas lÃneas de trabajo futuro, encaminadas a utilizar nuevas estrategias para mejorar el funcionamiento del sistema de diálogo
Resultados preliminares sobre SLHMM
En este trabajo se propone un nuevo sistema hÃbrido para el reconocimiento de voz continua que integra HMM y ANN. Dicho sistema se compone de 3 clases de bloques (LVQ, SLHMM y DP), todos ellos redes neuronales, si bien los denominados SLHMM pueden ser interpretados y entrenados de acuerdo con los HMM. Un SLHMM es, básicamente, una expansión en una red con un número fijo de capas de una red neuronal recurrente con una topologÃa conveniente. Se presentan algunos resultados experimentales preliminares que, comparados con los obtenidos a partir de un sistema basado únicamente en HMM, muestran un incremento en el rendimiento del sistema sencillamente debido a la topologÃa utilizada
Entrenamiento discriminativo para HMM utilizando redes neuronales recurrentes
En el presente artÃculo se presentan los resultados obtenidos a partir de una estructura en red para los Modelos Ocultos de Markov, aplicados al reconocimiento del habla. La topologÃa de la red es la de una Red Neuronal Recurrente, en la que cada iteración temporal es identificada con una capa. El entrenamiento de dicha red se realiza mediante técnicas de retropropagación. Dos tipos de medidas de error se utilizan para el entrenamiento: máxima semejanza y entrenamiento discriminativo. La aplicación de las técnicas de retropropagación para la reestimación de los HMM-RNN en el caso de entrenamiento por máxima semejanza proporciona las mismas ecuaciones de reestimación que el algoritmo de Baum-Welch utilizado para entrenar los HMM. El entrenamiento discriminativo se basa en la probabilidad de clasificación correcta de las secuencias a partir de la medida de máxima semejanza. Los resultados obtenidos han demostrado que el mejor procedimiento para entrenar los RNN-HMM consiste en realizar una primera estimación mediante la medida de máxima semejanza para, posteriormente, reentrenarlos mediante el algoritmo de entrenamiento discriminativo