20 research outputs found

    FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection

    Get PDF
    In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient

    Research on Outlier Detection Algorithm in Data Mining

    Get PDF
    离群点检测是数据挖掘中的一个分支,它的任务是识别其特征显著不同于其他数据的观测值。在我们平常的社会生活和自然界中,大部分的事件和对象,都是很寻常或者是平凡的。但是我们也不能因此忽视,在其中也有很多不寻常或者不平凡的对象存在的可能性。这些对象的事件背后可能蕴含着更大的研究价值,有着广阔的应用前景。因此,离群点检测是一个非常有意义的研究方向。 目前,研究者们已经提出了很多离群点检测方法,包括基于统计的离群点检测方法、基于频率的离群点检测方法、基于深度的离群点检测方法、基于距离的离群点检测方法和基于密度的离群点检测方法等。本文分析了离群点检测的研究背景、意义和国内外研究现状,研究基于距离的离群点检...Outlier detection is a branch of data mining. Its task is to identify the observations whose characteristics are significantly different from other data. In field of nature, human society, or data sets, most of the events and objects are ordinary or usual. But there are also many unusual or extraordinary objects. Value may be behind these objects. Outlier detection has broad application prospects....学位:工学硕士院系专业:软件学院_计算机软件与理论学号:2432011115227

    Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma

    Full text link
    A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.Comment: 14 pages, 40 figure

    Recent Advances in Anomaly Detection Methods Applied to Aviation

    Get PDF
    International audienceAnomaly detection is an active area of research with numerous methods and applications. This survey reviews the state-of-the-art of data-driven anomaly detection techniques and their application to the aviation domain. After a brief introduction to the main traditional data-driven methods for anomaly detection, we review the recent advances in the area of neural networks, deep learning and temporal-logic based learning. In particular, we cover unsupervised techniques applicable to time series data because of their relevance to the aviation domain, where the lack of labeled data is the most usual case, and the nature of flight trajectories and sensor data is sequential, or temporal. The advantages and disadvantages of each method are presented in terms of computational efficiency and detection efficacy. The second part of the survey explores the application of anomaly detection techniques to aviation and their contributions to the improvement of the safety and performance of flight operations and aviation systems. As far as we know, some of the presented methods have not yet found an application in the aviation domain. We review applications ranging from the identification of significant operational events in air traffic operations to the prediction of potential aviation system failures for predictive maintenance

    Eigengalaxies: Galaxy Morphology as a Linear Image Space and its Applications

    Get PDF
    In this thesis I contextualise the history of morphology as underpinned by Hubble's scheme, discrete in nature, and deeply connected to theories of galaxy formation history. I set out in contrast, to describe a purely empirical morphology, continuous in nature, in which surveys become image spaces and galaxies become points, the meaning of which is sought by the quantifiable differences of their relative spatial positions. I show how an image space can be robustly constructed and then build upon it to illustrate important applications such as approximating surveys with small samples, detecting outliers, clustering, similarity search and missing data prediction. The thesis proceeds as follows. Section 1 briefly surveys the importance, genesis and recent history of galaxy morphology. It also lays out the objectives of the thesis and information about the survey data which I have used. Section 2 describes how galaxy images can be processed and projected to a defensible low dimensional space in a morphology preserving way. Several analyses are then performed to test the fidelity of the projection. It is also shown how the image space can be given a probabilistic interpretation. Section 3 discusses methods for approximating surveys by reducing the number of objects under consideration. The section starts by describing simple random sampling and its limitations. It then shows how means and covariances can be used to summarise image spaces and how differences between image spaces can be quantified using the Kullback-Leibler divergence. This concept is then used to apply “leverage scores" sampling as a means to use information from the galaxy population to create a weighted sampling scheme which preserves mean and covariance better than random sampling and therefore enables much smaller representative samples. I also motivate and describe a cutting edge “coresets" methodology which I intend to more fully explore in future work. Section 4 demonstrates parsimonious applications of the image space framework to common use cases such as clustering, similarity search and outlier detection. It is a modified and abridged version of a paper to be published in MNRAS with some modification. Finally, section 5 draws summary conclusions and highlights important directions for the future

    Classification and Anomaly Detection for Astronomical Datasets

    No full text
    This work develops two new statistical techniques for astronomical problems: a star / galaxy separator for the UKIRT Infrared Deep Sky Survey (UKIDSS) and a novel anomaly detection method for cross-matched astronomical datasets. The star / galaxy separator is a statistical classification method which outputs class membership probabilities rather than class labels and allows the use of prior knowledge about the source populations. Deep Sloan Digital Sky Survey (SDSS) data from the multiply imaged Stripe 82 region is used to check the results from our classifier, which compares favourably with the UKIDSS pipeline classification algorithm. The anomaly detection method addresses the problem posed by objects having different sets of recorded variables in cross-matched datasets. This prevents the use of methods unable to handle missing values and makes direct comparison between objects difficult. For each source, our method computes anomaly scores in subspaces of the observed feature space and combines them to an overall anomaly score. The proposed technique is very general and can easily be used in applications other than astronomy. The properties and performance of our method are investigated using both real and simulated datasets

    Fault detection and diagnosis method for three-phase induction motor

    Get PDF
    Induction motors (IM) are critical components in many industrial processes. There is a continually increasing interest in the IMs’ fault diagnosis. The scope of this thesis involves condition monitoring and fault detection of three phase IMs. Different monitoring techniques have been used for fault detection on IMs. Vibration and stator current monitoring have gained privilege in literature and in the industry for fault diagnosis. The performance of the vibration and stator current setups was compared and evaluated. In that perspective, a number of data were captured from different faulty and healthy IMs by vibration and current sensors. The Principal Component Analysis (PCA) was utilized for feature extraction to monitor and classify collected data for finding the faults in IMs. A new method was proposed with the combined use of vibration and current setups for fault detection. It consists of two steps: firstly, the training part with the aim of giving acceleration property (nature of vibration data) to the current features, and secondly the testing part with the aim of excluding the vibration setup from the fault detection algorithm, while the output data have the property of vibration features. The 0-1 loss function was applied to show the accuracy of vibration, current and proposed fault detection method. The PCA classified results showed mixed and unseparated features for the current setup. The vibration setup and the proposed method resulted in substantial classified features. The 0-1 loss function results showed that the vibration setup and the developed method can provide a good level of accuracy. The vibration setup attained the highest accuracy of 98.2% in training and 92% in testing. The proposed method performed well with accuracies of 96.5% in training and 84% in testing. The current setup, however, attained the lowest level of accuracy (66.7% in training and 52% in testing). To assess the performance of the proposed method, the Confusion matrix of classification in NN was utilized. The Confusion matrix showed an accuracy of 95.1% of accuracy and negligible incorrect responses (4.9%), meaning that the proposed fault detection method is reliable with minimum possible errors. These vibration, current and proposed fault detection methods were also evaluated in terms of cost. The proposed method provided an affordable fault detection technique with a high accuracy applicable in various industrial fields
    corecore