7 research outputs found

    Decomposable Principal Component Analysis

    Full text link
    We consider principal component analysis (PCA) in decomposable Gaussian graphical models. We exploit the prior information in these models in order to distribute its computation. For this purpose, we reformulate the problem in the sparse inverse covariance (concentration) domain and solve the global eigenvalue problem using a sequence of local eigenvalue problems in each of the cliques of the decomposable graph. We demonstrate the application of our methodology in the context of decentralized anomaly detection in the Abilene backbone network. Based on the topology of the network, we propose an approximate statistical graphical model and distribute the computation of PCA

    Fault diagnosis for IP-based network with real-time conditions

    Get PDF
    BACKGROUND: Fault diagnosis techniques have been based on many paradigms, which derive from diverse areas and have different purposes: obtaining a representation model of the network for fault localization, selecting optimal probe sets for monitoring network devices, reducing fault detection time, and detecting faulty components in the network. Although there are several solutions for diagnosing network faults, there are still challenges to be faced: a fault diagnosis solution needs to always be available and able enough to process data timely, because stale results inhibit the quality and speed of informed decision-making. Also, there is no non-invasive technique to continuously diagnose the network symptoms without leaving the system vulnerable to any failures, nor a resilient technique to the network's dynamic changes, which can cause new failures with different symptoms. AIMS: This thesis aims to propose a model for the continuous and timely diagnosis of IP-based networks faults, independent of the network structure, and based on data analytics techniques. METHOD(S): This research's point of departure was the hypothesis of a fault propagation phenomenon that allows the observation of failure symptoms at a higher network level than the fault origin. Thus, for the model's construction, monitoring data was collected from an extensive campus network in which impact link failures were induced at different instants of time and with different duration. These data correspond to widely used parameters in the actual management of a network. The collected data allowed us to understand the faults' behavior and how they are manifested at a peripheral level. Based on this understanding and a data analytics process, the first three modules of our model, named PALADIN, were proposed (Identify, Collection and Structuring), which define the data collection peripherally and the necessary data pre-processing to obtain the description of the network's state at a given moment. These modules give the model the ability to structure the data considering the delays of the multiple responses that the network delivers to a single monitoring probe and the multiple network interfaces that a peripheral device may have. Thus, a structured data stream is obtained, and it is ready to be analyzed. For this analysis, it was necessary to implement an incremental learning framework that respects networks' dynamic nature. It comprises three elements, an incremental learning algorithm, a data rebalancing strategy, and a concept drift detector. This framework is the fourth module of the PALADIN model named Diagnosis. In order to evaluate the PALADIN model, the Diagnosis module was implemented with 25 different incremental algorithms, ADWIN as concept-drift detector and SMOTE (adapted to streaming scenario) as the rebalancing strategy. On the other hand, a dataset was built through the first modules of the PALADIN model (SOFI dataset), which means that these data are the incoming data stream of the Diagnosis module used to evaluate its performance. The PALADIN Diagnosis module performs an online classification of network failures, so it is a learning model that must be evaluated in a stream context. Prequential evaluation is the most used method to perform this task, so we adopt this process to evaluate the model's performance over time through several stream evaluation metrics. RESULTS: This research first evidences the phenomenon of impact fault propagation, making it possible to detect fault symptoms at a monitored network's peripheral level. It translates into non-invasive monitoring of the network. Second, the PALADIN model is the major contribution in the fault detection context because it covers two aspects. An online learning model to continuously process the network symptoms and detect internal failures. Moreover, the concept-drift detection and rebalance data stream components which make resilience to dynamic network changes possible. Third, it is well known that the amount of available real-world datasets for imbalanced stream classification context is still too small. That number is further reduced for the networking context. The SOFI dataset obtained with the first modules of the PALADIN model contributes to that number and encourages works related to unbalanced data streams and those related to network fault diagnosis. CONCLUSIONS: The proposed model contains the necessary elements for the continuous and timely diagnosis of IPbased network faults; it introduces the idea of periodical monitorization of peripheral network elements and uses data analytics techniques to process it. Based on the analysis, processing, and classification of peripherally collected data, it can be concluded that PALADIN achieves the objective. The results indicate that the peripheral monitorization allows diagnosing faults in the internal network; besides, the diagnosis process needs an incremental learning process, conceptdrift detection elements, and rebalancing strategy. The results of the experiments showed that PALADIN makes it possible to learn from the network manifestations and diagnose internal network failures. The latter was verified with 25 different incremental algorithms, ADWIN as concept-drift detector and SMOTE (adapted to streaming scenario) as the rebalancing strategy. This research clearly illustrates that it is unnecessary to monitor all the internal network elements to detect a network's failures; instead, it is enough to choose the peripheral elements to be monitored. Furthermore, with proper processing of the collected status and traffic descriptors, it is possible to learn from the arriving data using incremental learning in cooperation with data rebalancing and concept drift approaches. This proposal continuously diagnoses the network symptoms without leaving the system vulnerable to failures while being resilient to the network's dynamic changes.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José Manuel Molina López.- Secretario: Juan Carlos Dueñas López.- Vocal: Juan Manuel Corchado Rodrígue

    Виявлення мережевих аномалій за допомогою систем штучного інтелекту

    Get PDF
    Магістерська дисертація: 123с., 4 ч., 35 табл., 16 рис., 1 дод., 48 джерел. Об’єктом дослідження є мережеві аномалії. Предметом дослідження є використання мережевих аномалій для виявлення вторгнень. Мета роботи – розробити систему виявлення мережевих аномалій на основі досліджених алгоритмів та методів машинного навчання. Методи дослідження – статистичні методи, класифікаційні метод, методи на базі кластеризації, методи на базі знань, комбіновані методи. Актуальність – виявлення вторгнень та миттєве сповіщення адміністраторів мережі про потенційну загрозу інфраструктурі. Система перешкоджає зловмисникам отримати несакціонований доступ до мережі за допомогою як відомих так і невідомих атак. Новизна – на відміну від ручного адміністрування, автоматизована система дозволяє зекономити ресурси та не допускає помилки через людський фактор. Результати дослідження – побудована модель для автоматичного виявлення мережевих аномалій для запобігання вторгнень у мережу або інфраструктуру.Masters’ thesis: 123p., 4 s., 35 tabl., 16 fig., 1 appendix., 48 references. The object of this research is network anomalies. The subject of the research is the use of network anomalies for intrusion detection. The purpose of the work is to develop a system for detecting network anomalies based on the studied algorithms and methods of machine learning. Methods of the study – statistical methods, classification, clustering, knowledge base, combination learning. The relevance of the study – Intrusion detection and immediate notification of network administrators about a potential threat to the infrastructure. The system prevents intruders from accessing the network through both known and unknown attacks. Novelty – In contrast to manual administration, an automated system saves resources and does not make mistakes due to human factor. The results of the study – A model was built to automatically detect network anomalies to prevent intrusion into the network or infrastructure

    Quality of service analysis of internet links with minimal information

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, julio de 201

    Contribuciones basadas en el análisis biplot al diseño y gestión de redes de telecomunicación

    Get PDF
    [ES] La importancia de la redes de telecomunicación en nuestra sociedad es innegable. Desde la telefonía, tanto fija como móvil, hasta la red Internet están presentes en la mayoría de los hogares, empresas y administraciones públicas. Garantizar su correcto funcionamiento es de una importancia clave y la herramienta fundamental para este objetivo es un adecuado diseño y gestión de la red. Los métodos biplot, formulados por Gabriel en 1971, permiten representar una matriz de datos en forma de un gráfico que utiliza marcadores individuales para cada una de las filas y las columnas de la matriz de partida, respetando determinadas propiedades de los datos originales. En el diseño y gestión de redes se pueden utilizar múltiples tipos de matrices conteniendo diversos datos sobre su operación y configuración. Destacan entre ellas las matrices de tráfico, las matrices de topología y combinaciones de ambas. Por otro lado, las representaciones gráficas permiten a los diseñadores y gestores de la red identificar de manera eficiente y eficaz el estado de la red de comunicaciones. Esta tesis doctoral propone la utilización de los métodos biplot, en general, y del HJ-Biplot, propuesto por Galindo en 1986, en particular, en los procesos de diseño y gestión de redes de comunicación, presentando aplicaciones sobre las redes de datos más habituales hoy en día. Las propuestas se centran en tres casuísticas generales que cubren un amplio espectro de posibles aplicaciones: detección de anomalías, análisis de series temporales de tráfico y análisis de la topología de redes. La detección de anomalías se aplica en un primer ejemplo sobre datos de una red Ethernet real. Se demuestra que es posible utilizar la representación HJ-Biplot con dos objetivos: modelar la red con una representación adecuadamente robusta y detectar incidencias con la suficiente sensibilidad. En un segundo supuesto se aplica a la detección de un ataque de negación de servicio, como caso especial de anomalía, para lo que se utiliza un juego de datos publicados para la verificación del funcionamiento de este tipo de sistemas. En este apartado se incluye la aplicación del método STATIS para la detección de la anomalía, y finalmente el HJ-Biplot para la diagnosis concreta de la incidencia ocurrida en la red. El análisis de series temporales utilizando el HJ-Biplot mejora la propuesta realizada por Lakhina et al en 2004 y siguientes, que aplicaba el Análisis de Componentes Principales (ACP) a una matriz de tráfico Origen-Destino. El HJ-Biplot tiene en consideración la existencia simultánea de correlaciones temporales y espaciales en la matriz de tráfico y además permite localizar el punto de ocurrencia de la incidencia. Finalmente, la combinación de la teoría espectral de grafos, aplicada a redes de comunicación, y la metodología biplot en general, y el HJ-Biplot en particular, permite obtener representaciones gráficas de las redes de comunicación con información sobre su topología, incluso incorporando información sobre tráfico cursado, simétrico o asimétrico, entre nodos. La tesis doctoral presenta algunas contribuciones de los métodos biplot al análisis y gestión de las redes de comunicación más utilizadas en nuestros días. La herramienta propuesta permite mejorar los procedimientos de diseño y gestión de redes constituyendo una potente herramienta de visualización del estado de la red de comunicación

    Distributed Spatial Anomaly Detection

    No full text
    Abstract—Detection of traffic anomalies is an important problem that has been the focus of considerable research. Recent work has shown the utility of spatial detection of anomalies via crosslink traffic comparisons. In this paper we identify three advances that are needed to make such methods more useful and practical for network operators. First, anomaly detection methods should avoid global communication and centralized decision making. Second, nonparametric anomaly detection methods are needed to augment current parametric approaches. And finally, such methods should not just identify possible anomalies, but should also annotate each detection with some probabilistic qualifier of its importance. We propose a framework that simultaneously advances the current state of the art on all three fronts. We show that routers can effectively identify volume anomalies through crosslink comparison of traffic observed only on the router’s own links. Second, we show that generalized quantile estimators are an effective way to identify high-dimensional sets of local traffic patterns that are potentially anomalous; such methods can be either parametric or nonparametric, and we evaluate both. Third, through the use of false discovery rate as a detection metric, we show that candidate anomalous patterns can be equipped with an estimate of a probability that they truly are anomalous. Overall, our framework provides network operators with an anomaly detection methodology that is distributed, effective, and easily interpretable. Part of the underlying statistical framework, which merges aspects of nonparametric set estimation and multiple hypothesis testing, is novel in itself, although the derivation of that framework is necessarily given elsewhere. I
    corecore