1,531 research outputs found

    Outlier Mining Methods Based on Graph Structure Analysis

    Get PDF
    Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.Peer ReviewedPostprint (published version

    Hijacking Wireless Communications using WiFi Pineapple NANO as a Rogue Access Point

    Get PDF
    Wireless access points are an effective solution for building scalable, flexible, mobile networks. The problem with these access points is often the lack of security. Users regularly connect to wireless access points without thinking about whether they are genuine or malicious. Moreover, users are not aware of the types of attacks that can come from “rogue” access points set up by attackers and what information can be captured by them. Attackers use this advantage to gain access to users’ confidential information. The objective of this study is to examine the effectiveness of the WiFi Pineapple NANO used as a rogue access point (RAP) in tricking users to connect to it. As part of the preliminary study, a brief survey was provided to users who connected to the Pineapple to evaluate the reasons why users connect to RAPs. The result of the cybersecurity pilot study indicated that lack of awareness played an important role. Specifically, users unknowingly connect to rogue wireless access points that put at risk not only their devices, but the whole network. The information collected in this research could be used to better educate users on identifying possible RAPs and the dangers of connecting to them

    Deployment of Next Generation Intrusion Detection Systems against Internal Threats in a Medium-sized Enterprise

    Get PDF
    In this increasingly digital age, companies struggle to understand the origin of cyberattacks. Malicious actions can come from both the outside and the inside the business, so it is necessary to adopt tools that can reduce cyber risks by identifying the anomalies when the first symptoms appear. This thesis deals with the topic of internal attacks and explains how to use innovative Intrusion Detection Systems to protect the IT infrastructure of Medium-sized Enterprises. These types of technologies try to solve issues like poor visibility of network traffic, long response times to security breaches, and the use of inefficient access control mechanisms. In this research, multiple types of internal threats, the different categories of Intrusion Detection Systems and an in-depth analysis of the state-of-the-art IDSs developed during the last few years have been detailed. After that, there will be a brief explanation of the effectiveness of IDSs in both testing and production environments. All the reported phases took place within a company network, starting from the positioning of the IDS, moving on to its configuration and ending with the production environment. There is an analysis of the company expectations, together with an explanation of the different IDSs characteristics. This research shows data about potential attacks, mitigated and resolved threats, as well as network changes made thanks to the information gathered while using a cutting edge IDS. Moreover, the characteristics that a medium-sized company must have in order to be adequately protected by a new generation IDS have been generalized. In the same way, the functionalities that an IDS must possess in order to achieve the set objectives were reported. IDSs are incredibly adaptable to different environments, such as companies of different sectors and sizes, and can be tuned to achieve better results. At the end of this document are reported the potential future developments that should be addressed to improve IDS technologies further

    Anomaly Detection In Blockchain

    Get PDF
    Anomaly detection has been a well-studied area for a long time. Its applications in the financial sector have aided in identifying suspicious activities of hackers. However, with the advancements in the financial domain such as blockchain and artificial intelligence, it is more challenging to deceive financial systems. Despite these technological advancements many fraudulent cases have still emerged. Many artificial intelligence techniques have been proposed to deal with the anomaly detection problem; some results appear to be considerably assuring, but there is no explicit superior solution. This thesis leaps to bridge the gap between artificial intelligence and blockchain by pursuing various anomaly detection techniques on transactional network data of a public financial blockchain named 'Bitcoin'. This thesis also presents an overview of the blockchain technology and its application in the financial sector in light of anomaly detection. Furthermore, it extracts the transactional data of bitcoin blockchain and analyses for malicious transactions using unsupervised machine learning techniques. A range of algorithms such as isolation forest, histogram based outlier detection (HBOS), cluster based local outlier factor (CBLOF), principal component analysis (PCA), K-means, deep autoencoder networks and ensemble method are evaluated and compared

    Deteção de atividades ilícitas de software Bots através do DNS

    Get PDF
    DNS is a critical component of the Internet where almost all Internet applications and organizations rely on. Its shutdown can deprive them from being part of the Internet, and hence, DNS is usually the only protocol to be allowed when Internet access is firewalled. The constant exposure of this protocol to external entities force corporations to always be observant of external rogue software that may misuse the DNS to establish covert channels and perform multiple illicit activities, such as command and control and data exfiltration. Most current solutions for bot malware and botnet detection are based on Deep Packet Inspection techniques, such as analyzing DNS query payloads, which may reveal private and sensitive information. In addiction, the majority of existing solutions do not consider the usage of licit and encrypted DNS traffic, where Deep Packet Inspection techniques are impossible to be used. This dissertation proposes mechanisms to detect malware bots and botnet behaviors on DNS traffic that are robust to encrypted DNS traffic and that ensure the privacy of the involved entities by analyzing instead the behavioral patterns of DNS communications using descriptive statistics over collected network metrics such as packet rates, packet lengths, and silence and activity periods. After characterizing DNS traffic behaviors, a study of the processed data is conducted, followed by the training of Novelty Detection algorithms with the processed data. Models are trained with licit data gathered from multiple licit activities, such as reading the news, studying, and using social networks, in multiple operating systems, browsers, and configurations. Then, the models were tested with similar data, but containing bot malware traffic. Our tests show that our best performing models achieve detection rates in the order of 99%, and 92% for malware bots using low throughput rates. This work ends with some ideas for a more realistic generation of bot malware traffic, as the current DNS Tunneling tools are limited when mimicking licit DNS usages, and for a better detection of malware bots that use low throughput rates.O DNS é um componente crítico da Internet, já que quase todas as aplicações e organizações que a usam dependem dele para funcionar. A sua privação pode deixá-las de fazerem parte da Internet, e por causa disso, o DNS é normalmente o único protocolo permitido quando o acesso à Internet está restrito. A exposição constante deste protocolo a entidades externas obrigam corporações a estarem sempre atentas a software externo ilícito que pode fazer uso indevido do DNS para estabelecer canais secretos e realizar várias atividades ilícitas, como comando e controlo e exfiltração de dados. A maioria das soluções atuais para detecção de malware bots e de botnets são baseadas em técnicas inspeção profunda de pacotes, como analizar payloads de pedidos de DNS, que podem revelar informação privada e sensitiva. Além disso, a maioria das soluções existentes não consideram o uso lícito e cifrado de tráfego DNS, onde técnicas como inspeção profunda de pacotes são impossíveis de serem usadas. Esta dissertação propõe mecanismos para detectar comportamentos de malware bots e botnets que usam o DNS, que são robustos ao tráfego DNS cifrado e que garantem a privacidade das entidades envolvidas ao analizar, em vez disso, os padrões comportamentais das comunicações DNS usando estatística descritiva em métricas recolhidas na rede, como taxas de pacotes, o tamanho dos pacotes, e os tempos de atividade e silêncio. Após a caracterização dos comportamentos do tráfego DNS, um estudo sobre os dados processados é realizado, sendo depois usados para treinar os modelos de Detecção de Novidades. Os modelos são treinados com dados lícitos recolhidos de multiplas atividades lícitas, como ler as notícias, estudar, e usar redes sociais, em multiplos sistemas operativos e com multiplas configurações. De seguida, os modelos são testados com dados lícitos semelhantes, mas contendo também tráfego de malware bots. Os nossos testes mostram que com modelos de Detecção de Novidades é possível obter taxas de detecção na ordem dos 99%, e de 98% para malware bots que geram pouco tráfego. Este trabalho finaliza com algumas ideas para uma geração de tráfego ilícito mais realista, já que as ferramentas atuais de DNS tunneling são limitadas quando usadas para imitar usos de DNS lícito, e para uma melhor deteção de situações onde malware bots geram pouco tráfego.Mestrado em Engenharia de Computadores e Telemátic

    Machine learning methods for the characterization and classification of complex data

    Get PDF
    This thesis work presents novel methods for the analysis and classification of medical images and, more generally, complex data. First, an unsupervised machine learning method is proposed to order anterior chamber OCT (Optical Coherence Tomography) images according to a patient's risk of developing angle-closure glaucoma. In a second study, two outlier finding techniques are proposed to improve the results of above mentioned machine learning algorithm, we also show that they are applicable to a wide variety of data, including fraud detection in credit card transactions. In a third study, the topology of the vascular network of the retina, considering it a complex tree-like network is analyzed and we show that structural differences reveal the presence of glaucoma and diabetic retinopathy. In a fourth study we use a model of a laser with optical injection that presents extreme events in its intensity time-series to evaluate machine learning methods to forecast such extreme events.El presente trabajo de tesis desarrolla nuevos métodos para el análisis y clasificación de imágenes médicas y datos complejos en general. Primero, proponemos un método de aprendizaje automático sin supervisión que ordena imágenes OCT (tomografía de coherencia óptica) de la cámara anterior del ojo en función del grado de riesgo del paciente de padecer glaucoma de ángulo cerrado. Luego, desarrollamos dos métodos de detección automática de anomalías que utilizamos para mejorar los resultados del algoritmo anterior, pero que su aplicabilidad va mucho más allá, siendo útil, incluso, para la detección automática de fraudes en transacciones de tarjetas de crédito. Mostramos también, cómo al analizar la topología de la red vascular de la retina considerándola una red compleja, podemos detectar la presencia de glaucoma y de retinopatía diabética a través de diferencias estructurales. Estudiamos también un modelo de un láser con inyección óptica que presenta eventos extremos en la serie temporal de intensidad para evaluar diferentes métodos de aprendizaje automático para predecir dichos eventos extremos.Aquesta tesi desenvolupa nous mètodes per a l’anàlisi i la classificació d’imatges mèdiques i dades complexes. Hem proposat, primer, un mètode d’aprenentatge automàtic sense supervisió que ordena imatges OCT (tomografia de coherència òptica) de la cambra anterior de l’ull en funció del grau de risc del pacient de patir glaucoma d’angle tancat. Després, hem desenvolupat dos mètodes de detecció automàtica d’anomalies que hem utilitzat per millorar els resultats de l’algoritme anterior, però que la seva aplicabilitat va molt més enllà, sent útil, fins i tot, per a la detecció automàtica de fraus en transaccions de targetes de crèdit. Mostrem també, com en analitzar la topologia de la xarxa vascular de la retina considerant-la una xarxa complexa, podem detectar la presència de glaucoma i de retinopatia diabètica a través de diferències estructurals. Finalment, hem estudiat un làser amb injecció òptica, el qual presenta esdeveniments extrems en la sèrie temporal d’intensitat. Hem avaluat diferents mètodes per tal de predir-los.Postprint (published version

    Multi-Source Data Fusion for Cyberattack Detection in Power Systems

    Full text link
    Cyberattacks can cause a severe impact on power systems unless detected early. However, accurate and timely detection in critical infrastructure systems presents challenges, e.g., due to zero-day vulnerability exploitations and the cyber-physical nature of the system coupled with the need for high reliability and resilience of the physical system. Conventional rule-based and anomaly-based intrusion detection system (IDS) tools are insufficient for detecting zero-day cyber intrusions in the industrial control system (ICS) networks. Hence, in this work, we show that fusing information from multiple data sources can help identify cyber-induced incidents and reduce false positives. Specifically, we present how to recognize and address the barriers that can prevent the accurate use of multiple data sources for fusion-based detection. We perform multi-source data fusion for training IDS in a cyber-physical power system testbed where we collect cyber and physical side data from multiple sensors emulating real-world data sources that would be found in a utility and synthesizes these into features for algorithms to detect intrusions. Results are presented using the proposed data fusion application to infer False Data and Command injection-based Man-in- The-Middle (MiTM) attacks. Post collection, the data fusion application uses time-synchronized merge and extracts features followed by pre-processing such as imputation and encoding before training supervised, semi-supervised, and unsupervised learning models to evaluate the performance of the IDS. A major finding is the improvement of detection accuracy by fusion of features from cyber, security, and physical domains. Additionally, we observed the co-training technique performs at par with supervised learning methods when fed with our features
    corecore