100,231 research outputs found

    Knowledge Reused Outlier Detection

    Get PDF
    Tremendous efforts have been invested in the unsupervised outlier detection research, which is conducted on unlabeled data set with abnormality assumptions. With abundant related labeled data available as auxiliary information, we consider transferring the knowledge from the labeled source data to facilitate the unsupervised outlier detection on target data set. To fully make use of the source knowledge, the source data and target data are put together for joint clustering and outlier detection using the source data cluster structure as a constraint. To achieve this, the categorical utility function is employed to regularize the partitions of target data to be consistent with source data labels. With an augmented matrix, the problem is completely solved by a K-means - a based method with the rigid mathematical formulation and theoretical convergence guarantee. We have used four real-world data sets and eight outlier detection methods of different kinds for extensive experiments and comparison. The results demonstrate the effectiveness and significant improvements of the proposed methods in terms of outlier detection and cluster validity metrics. Moreover, the parameter analysis is provided as a practical guide, and noisy source label analysis proves that the proposed method can handle real applications where source labels can be noisy

    Outlier edge detection using random graph generation models and applications

    Get PDF
    Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose effective outlier edge detection algorithm. The proposed algorithms are inspired by community structures that are very common in social networks. We found that the graph structure around an edge holds critical information for determining the authenticity of the edge. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. More important, by analyzing the authenticity of the edges in a graph, we are able to reveal underlying structure and properties of a graph. Thus, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: (1) a preprocessing tool that improves the performance of graph clustering algorithms; (2) an outlier node detection algorithm; and (3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques. They also address the importance of analyzing the edges in graph mining—a topic that has been mostly neglected by researchers.Academy of Finland supported this research

    Anomaly Detection Methods to Improve Supply Chain Data Quality and Operations

    Get PDF
    Supply chain operations drive the planning, manufacture, and distribution of billions of semiconductors a year, spanning thousands of products across many supply chain configurations. The customizations span from wafer technology to die stacking and chip feature enablement. Data quality drives efficiency in these processes and anomalies in data can be very disruptive, and at times, consequential. Developing preventative measures that automate the detection of anomalies before they reach downstream execution systems would result in significant efficiency gain for the organization. The purpose of this research is to identify an effective, actionable, and computationally efficient approach to highlight anomalies in a sparse and highly variable supply chain data structure. This research highlights the application of ensemble unsupervised learning algorithms for anomaly detection on supply chain demand data. The outlier detection algorithms explored include Angle-Based Outlier Detection, Isolation Forest, Local Outlier Factor and K-Nearest Neighbors. The application of an ensemble technique on unconstrained forecast signal, which is traditionally a consistent demand line, demonstrated a dramatic decrease in false positives. The application of the ensemble technique to the sales-order netted demand forecast, a signal that is irregular in structure, the algorithm identifies true anomalous observations relative to historical observations across time. The research team concluded that assessing an outlier is not limited to the most recent forecast’s observations but must be considered in the context of historical demand patterns across time

    Point Cloud Denoising and Outlier Detection with Local Geometric Structure by Dynamic Graph CNN

    Full text link
    The digitalization of society is rapidly developing toward the realization of the digital twin and metaverse. In particular, point clouds are attracting attention as a media format for 3D space. Point cloud data is contaminated with noise and outliers due to measurement errors. Therefore, denoising and outlier detection are necessary for point cloud processing. Among them, PointCleanNet is an effective method for point cloud denoising and outlier detection. However, it does not consider the local geometric structure of the patch. We solve this problem by applying two types of graph convolutional layer designed based on the Dynamic Graph CNN. Experimental results show that the proposed methods outperform the conventional method in AUPR, which indicates outlier detection accuracy, and Chamfer Distance, which indicates denoising accuracy.Comment: 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE 2023

    Spatially smoothed robust covariance estimation for local outlier detection

    Full text link
    Most multivariate outlier detection procedures ignore the spatial dependency of observations, which is present in many real data sets from various application areas. This paper introduces a new outlier detection method that accounts for a (continuously) varying covariance structure, depending on the spatial neighborhood of the observations. The underlying estimator thus constitutes a compromise between a unified global covariance estimation, and local covariances estimated for individual neighborhoods. Theoretical properties of the estimator are presented, in particular related to robustness properties, and an efficient algorithm for its computation is introduced. The performance of the method is evaluated and compared based on simulated data and for a data set recorded from Austrian weather stations

    Outlier Detection and a Method of Adjustment for the Iranian Manufacturing Establishment Survey Data

    Get PDF
    The role and importance of the industrial sector in the economic development necessitate the need to collect and to analyze accurate and timely data for exact planning. As the occurrence of outliers in establishment surveys are common due to the structure of the economy, the evaluation of survey data by identifying and investigating outliers, prior to the release of data, is necessary. In this paper, different robust multivariate outlier detection methods based on the Mahalanobis distance with blocked adaptive computationally efficient outlier nominators algorithm, minimum volume ellipsoid estimator, minimum covariance determinant estimator and Stahel-Donoho estimator are used in the context of a real dataset. Also some univariate outlier detection methods such as Hadi and Simonoff’s method, and Hidiroglou-Barthelot’s method for periodic manufacturing surveys are applied. The real data set is extracted from the Iranian Manufacturing Establishment Survey. These data are collected each year by the Statistical Center of Iran using sampling weights. In this paper, in addition to comparing different multivariate and univariate robust outlier detection methods, a new empirical method for reducing the effect of outliers based on the value modification method is introduced and applied on some important variables such as input and output. In this paper, a new four-step algorithm is introduced to adjust the input and output values of the manufacturing establishments which are under-reported or over-reported. A simulation study for investigating the performance of our method is also presented

    Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data

    Full text link
    Diagnosing and cleaning data is a crucial step for building robust machine learning systems. However, identifying problems within large-scale datasets with real-world distributions is challenging due to the presence of complex issues such as label errors, under-representation, and outliers. In this paper, we propose a unified approach for identifying the problematic data by utilizing a largely ignored source of information: a relational structure of data in the feature-embedded space. To this end, we present scalable and effective algorithms for detecting label errors and outlier data based on the relational graph structure of data. We further introduce a visualization tool that provides contextual information of a data point in the feature-embedded space, serving as an effective tool for interactively diagnosing data. We evaluate the label error and outlier/out-of-distribution (OOD) detection performances of our approach on the large-scale image, speech, and language domain tasks, including ImageNet, ESC-50, and MNLI. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in debugging large-scale real-world datasets across various domains.Comment: preprin
    • …
    corecore