544 research outputs found

    Spectral Ranking and Unsupervised Feature Selection for Point, Collective and Contextual Anomaly Detection

    Get PDF
    Anomaly detection problems can be classified into three categories: point anomaly detection, collective anomaly detection and contextual anomaly detection. Many algorithms have been devised to address anomaly detection of a specific type from various application domains. Nevertheless, the exact type of anomalies to be detected in practice is generally unknown under unsupervised setting, and most of the methods exist in literature usually favor one kind of anomalies over the others. Applying an algorithm with an incorrect assumption is unlikely to produce reasonable results. This thesis thereby investigates the possibility of applying a uniform approach that can automatically discover different kinds of anomalies. Specifically, we are primarily interested in Spectral Ranking for Anomalies (SRA) for its potential in detecting point anomalies and collective anomalies simultaneously. We show that the spectral optimization in SRA can be viewed as a relaxation of an unsupervised SVM problem under some assumptions. SRA thereby results in a bi-class classification strength measure that can be used to rank the point anomalies, along with a normal vs. abnormal classification for identifying collective anomalies. However, in dealing with contextual anomaly problems with different contexts defined by different feature subsets, SRA and other popular methods are still not sufficient on their own. Accordingly, we propose an unsupervised backward elimination feature selection algorithm BAHSIC-AD, utilizing Hilbert-Schmidt Independence Critirion (HSIC) in identifying the data instances present as anomalies in the subset of features that have strong dependence with each other. Finally, we demonstrate the effectiveness of SRA combined with BAHSIC-AD by comparing their performance with other popular anomaly detection methods on a few benchmarks, including both synthetic datasets and real world datasets. Our computational results jusitify that, in practice, SRA combined with BAHSIC-AD can be a generally applicable method for detecting different kinds of anomalies

    Machine Learning for Identifying Group Trajectory Outliers

    Get PDF
    Prior works on the trajectory outlier detection problem solely consider individual outliers. However, in real-world scenarios, trajectory outliers can often appear in groups, e.g., a group of bikes that deviates to the usual trajectory due to the maintenance of streets in the context of intelligent transportation. The current paper considers the Group Trajectory Outlier (GTO) problem and proposes three algorithms. The first and the second algorithms are extensions of the well-known DBSCAN and kNN algorithms, while the third one models the GTO problem as a feature selection problem. Furthermore, two different enhancements for the proposed algorithms are proposed. The first one is based on ensemble learning and computational intelligence, which allows for merging algorithms’ outputs to possibly improve the final result. The second is a general high-performance computing framework that deals with big trajectory databases, which we used for a GPU-based implementation. Experimental results on different real trajectory databases show the scalability of the proposed approaches.acceptedVersio

    Network anomaly detection research: a survey

    Get PDF
    Data analysis to identifying attacks/anomalies is a crucial task in anomaly detection and network anomaly detection itself is an important issue in network security. Researchers have developed methods and algorithms for the improvement of the anomaly detection system. At the same time, survey papers on anomaly detection researches are available. Nevertheless, this paper attempts to analyze futher and to provide alternative taxonomy on anomaly detection researches focusing on methods, types of anomalies, data repositories, outlier identity and the most used data type. In addition, this paper summarizes information on application network categories of the existing studies

    Unsupervised Spectral Ranking For Anomaly Detection

    Get PDF
    Anomaly detection is the problem of finding deviations from expected normal patterns. A wide variety of applications, such as fraud detection for credit cards and insurance, medical image monitoring, network intrusion detection, and military surveillance, can be viewed as anomaly detection. For anomaly detection, obtaining accurate labels, especially labels for anomalous cases, is costly and time consuming, if not practically infeasible. This makes supervised anomaly detection less desirable in the domain of anomaly detection. In this thesis, we propose a novel unsupervised spectral ranking method for anomaly detection (SRA). Based on the 1st non-principal eigenvectors from Laplacian matrices, the proposed SRA can generate anomaly ranking either with respect to a single majority class or with respect to multiple majority classes. The ranking type is based on whether the percentage of the smaller class instances (positive or negative) is larger than the expected upper bound of the anomaly ratio. We justify the proposed spectral ranking by establishing a connection between the unsupervised support vector machine optimization and the spectral Laplacian optimization problem. Using both synthetic and real data sets, we show that our proposed SRA is a meaningful and effective alternative to the state-of-art unsupervised anomaly ranking methods. In addition, we show that, in certain scenarios, unsupervised SRA method surpasses the state-of-art unsupervised anomaly ranking methods in terms of performance and robustness of parameter tuning. Finally, we demonstrate that choosing appropriate similarity measures remains crucial in applying our proposed SRA algorithm

    A Flexible Outlier Detector Based on a Topology Given by Graph Communities

    Get PDF
    Acord transformatiu CRUE-CSICOutlier detection is essential for optimal performance of machine learning methods and statistical predictive models. Their detection is especially determinant in small sample size unbalanced problems, since in such settings outliers become highly influential and significantly bias models. This particular experimental settings are usual in medical applications, like diagnosis of rare pathologies, outcome of experimental personalized treatments or pandemic emergencies. In contrast to population-based methods, neighborhood based local approaches compute an outlier score from the neighbors of each sample, are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. A main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters, like the number of neighbors. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world and synthetic data sets show that our approach outperforms, both, local and global strategies in multi and single view settings

    Anomaly Detection Algorithms and Techniques for Network Intrusion Detection Systems

    Get PDF
    In recent years, many deep learning-based models have been proposed for anomaly detection. This thesis presents a comparison of selected deep autoencoding models and classical anomaly detection methods on three modern network intrusion detection datasets. We experiment with different configurations and architectures of the selected models, as well as aggregation techniques for input preprocessing and output postprocessing. We propose a methodology for creating benchmark datasets for the evaluation of the methods in different settings. We provide a statistical comparison of the performance of the selected techniques. We conclude that the deep autoencoding models, in particular AE and VAE, systematically outperform the classic methods. Furthermore, we show that aggregating input network flow data improves the overall performance. In general, the tested techniques are promising regarding their application in network intrusion detection systems. However, secondary techniques must be employed to reduce the high numbers of generated false alarms

    Novel gumbel-softmax trick enabled concrete autoencoder with entropy constraints for unsupervised hyperspectral band selection.

    Get PDF
    As an important topic in hyperspectral image (HSI) analysis, band selection has attracted increasing attention in the last two decades for dimensionality reduction in HSI. With the great success of deep learning (DL)-based models recently, a robust unsupervised band selection (UBS) neural network is highly desired, particularly due to the lack of sufficient ground truth information to train the DL networks. Existing DL models for band selection either depend on the class label information or have unstable results via ranking the learned weights. To tackle these challenging issues, in this article, we propose a Gumbel-Softmax (GS) trick enabled concrete autoencoder-based UBS framework (CAE-UBS) for HSI, in which the learning process is featured by the introduced concrete random variables and the reconstruction loss. By searching from the generated potential band selection candidates from the concrete encoder, the optimal band subset can be selected based on an information entropy (IE) criterion. The idea of the CAE-UBS is quite straightforward, which does not rely on any complicated strategies or metrics. The robust performance on four publicly available datasets has validated the superiority of our CAE-UBS framework in the classification of the HSIs
    corecore