83 research outputs found

    Zero-day Network Intrusion Detection using Machine Learning Approach

    Get PDF
    Zero-day network attacks are a growing global cybersecurity concern. Hackers exploit vulnerabilities in network systems, making network traffic analysis crucial in detecting and mitigating unauthorized attacks. However, inadequate and ineffective network traffic analysis can lead to prolonged network compromises. To address this, machine learning-based zero-day network intrusion detection systems (ZDNIDS) rely on monitoring and collecting relevant information from network traffic data. The selection of pertinent features is essential for optimal ZDNIDS performance given the voluminous nature of network traffic data, characterized by attributes. Unfortunately, current machine learning models utilized in this field exhibit inefficiency in detecting zero-day network attacks, resulting in a high false alarm rate and overall performance degradation. To overcome these limitations, this paper introduces a novel approach combining the anomaly-based extended isolation forest algorithm with the BAT algorithm and Nevergrad. Furthermore, the proposed model was evaluated using 5G network traffic, showcasing its effectiveness in efficiently detecting both known and unknown attacks, thereby reducing false alarms when compared to existing systems. This advancement contributes to improved internet security

    Randomized outlier detection with trees

    Get PDF
    Isolation forest (IF) is a popular outlier detection algorithm that isolates outlier observations from regular observations by building multiple random isolation trees. The average number of comparisons required to isolate a given observation can then be used as a measure of its outlierness. Multiple extensions of this approach have been proposed in the literature including the extended isolation forest (EIF) as well as the SCiForest. However, we find a lack of theoretical explanation on why IF, EIF, and SCiForest offer such good practical performance. In this paper, we present a theoretical framework that views these approaches from a distributional viewpoint. Using this viewpoint, we show that isolation-based approaches first accurately approximate the data distribution and then secondly approximate the coefficients of mixture components using the average path length. Using this framework, we derive the generalized isolation forest (GIF) that also trains random isolation trees, but combining them moves beyond using the average path length. That is, GIF splits the data into multiple sub-spaces by sampling random splits as do the original IF variants do and directly estimates the mixture coefficients of a mixture distribution to score the outlierness on entire regions of data. In an extensive evaluation, we compare GIF with 18 state-of-the-art outlier detection methods on 14 different datasets. We show that GIF outperforms three competing tree-based methods and has a competitive performance to other nearest-neighbor approaches while having a lower runtime. Last, we highlight a use-case study that uses GIF to detect transaction fraud in financial data

    itsdm: Isolation forest-based presence-only species distribution modelling and explanation in r

    Get PDF
    Multiple statistical algorithms have been used for species distribution modelling (SDM). Due to shortcomings in species occurrence datasets, presence-only methods (such as MaxEnt) have become increasingly widely used. However, sampling bias remains a challenging issue, particularly for density-based approaches. The Isolation Forest (iForest) algorithm is a presence-only method less sensitive to sampling patterns and over-fitting because it fits the model by describing the unsuitable instead of suitable conditions. Here, we present the itsdm package for species distribution modelling with iForest, which provides a workflow wrapper for the algorithms in iForest family and convenient tools for model diagnostic and post-modelling analysis. itsdm allows users to fit and evaluate an iForest SDM using presence-only occurrence data. It also helps the users to understand relationships between species and the living environment using Shapley values, a suggested technique in explainable artificial intelligence (xAI). Additionally, itsdm can make spatial response maps that indicate how species respond to environmental variables across space and detect areas potentially affected by a changing environment. We demonstrated the usage of the itsdm package and compared iForest with other mainstream SDMs using virtual species. The results enlightened that iForest is an advantageous presence-only SDM when the actual distribution range is unclear. © 2023 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society

    Comparison of anomaly detection techniques applied to different problems in the telecom industry

    Get PDF
    Nowadays, with the growth of digital transformation in companies, a huge amount of data is generated every second as a result of various processes. Often this data contains important information which, when properly analyzed, can help a company gain a competitive advantage. One data processing task common to many different applications is detection of anomalies, that is, data points or groups of data points that stand out from most of the others. Since it is not feasible to have an operator constantly analyzing the data to find anomalous values, due to the generally large volumes of data, the focus of this dissertation is the exploration of a Data Mining area called anomaly detection. In this dissertation we first develop an anomaly detection software in Python, that applies 10 different anomaly detection algorithms, after automatically optimizing their parameters, to an arbitrary dataset. Before applying these algorithms, the software also performs the task of data scaling and imputation of missing values. It outputs the results of the performance metrics of each algorithm, the values of the optimized parameters and the graphics for the results visualization generated using the method t-SNE. This software was then applied to three case studies to compare the performance of different anomaly detection approaches using real-world datasets. These datasets have an increasing level of difficulty associated with them: the amount of missing data and the uncertainty associated with the ground truth regarding the anomalies. In the first case study, we detected fraudulent bank transactions using a public dataset. Then, in the second case we identified clients of a telecommunication company who were likely to miss their payment, leading to contract termination. For this case we used a dataset from a telecommunications company. In the third case, we detected low quality of internet service, again using a large dataset with real measurements from a telecommunications company. Finally, we implemented a state of the art, neural network model, specially applicable to the task of identifying anomalies in time-series data. We optimized the parameters of the network, and applied it to address the problem of low quality of service.Com o crescimento da transformação digital nas empresas, uma quantidade enorme de dados são gerados a cada segundo como consequência de variados processos. Muitas das vezes esses dados contêm informação importante que podem permitir a uma determinada empresa obter uma vantagem competitiva. Uma forma de obter conhecimento sobre o actual funcionamento de um determinado processo é através da detecção de anomalias, ou seja, instâncias de dados que se destacam da maioria das restantes. Visto não ser viável ter um operador a visualizar linhas de dados para encontrar anomalias, devido às dimensões dos dados, o foco desta dissertação revolve em torno da exploração de uma área de Data Mining chamada detecção de anomalias. Nesta dissertação propõe-se em primeiro lugar um software de detecção de anomalias feito em Python que aplica um conjunto de 10 algoritmos de detecção de anomalias, depois de optimizar os seus parâmetros automaticamente, a um conjunto de dados arbitrários. Antes da aplicação dos algoritmos, o software realiza primeiramente a sua normalização e a imputação dos valores nulos. Por fim, retorna os resultados das métricas de desempenho de cada algoritmo, os parâmetros escolhidos e um conjunto de gráficos para visualização de resultados, gerados utilizando t-SNE. Este software foi então aplicado a três casos de estudo para comparar o desempenho das diferentes técnicas utilizando conjuntos de dados reais. Estes conjuntos de dados têm um nível crescente de dificuldade associado a eles: a quantidade de valores nulos e a incerteza em relação aos pontos realmente anómalos. O primeiro é relacionado com transacções bancárias onde se utilizou um conjunto de dados público. Depois, um caso de estudo relacionado com cessações de contrato devido à falta de pagamento, onde foi utilizado um conjunto de dados de uma empresa de telecomunicações. Por último um caso de estudo relacionado com a qualidade de serviço de clientes de uma empresa de telecomunicações. Por fim, foi implementada uma arquitectura de um modelo de redes neuronais avançado de detecção de anomalias em séries temporais, que foi utilizado para detectar anomalias no conjunto de dados de qualidade de serviço

    SQ-SLAM: Monocular Semantic SLAM Based on Superquadric Object Representation

    Full text link
    Object SLAM uses additional semantic information to detect and map objects in the scene, in order to improve the system's perception and map representation capabilities. Quadrics and cubes are often used to represent objects, but their single shape limits the accuracy of object map and thus affects the application of downstream tasks. In this paper, we introduce superquadrics (SQ) with shape parameters into SLAM for representing objects, and propose a separate parameter estimation method that can accurately estimate object pose and adapt to different shapes. Furthermore, we present a lightweight data association strategy for correctly associating semantic observations in multiple views with object landmarks. We implement a monocular semantic SLAM system with real-time performance and conduct comprehensive experiments on public datasets. The results show that our method is able to build accurate object map and has advantages in object representation. Code will be released upon acceptance.Comment: Submitted to ICRA 202

    T2D2: A Time Series Tester, Transformer, and Decomposer Framework for Outlier Detection

    Get PDF
    The automatic detection of outliers in time series datasets has captured much amount of attention in the data science community. It is not a simple task as the data may have several perspectives, such as sessional, trendy, or a combination of the two. Furthermore, to obtain a reliable and untrustworthy knowledge from the data, the data itself should be understandable. To cope with these challenges, in this paper, we introduce a new framework that can first test the stationarity and seasonality of dataset, then apply a set of Fourier transforms to get the Fourier sample frequencies, which can be used as a support of a decomposer component. The proposed framework, namely TTDD (Test, Transform, Decompose, and Detection), implements the decomposer component that split the dataset into three parts: trend, seasonal, and residual. Finally, the frequency difference detector compares the frequency of the test set to the frequency of the training set determining the periods of discrepancy in the frequency considering them as outlier periods
    corecore