1,161 research outputs found

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Visual Knowledge Tracing

    Get PDF
    Each year, thousands of people learn new visual categorization tasks -- radiologists learn to recognize tumors, birdwatchers learn to distinguish similar species, and crowd workers learn how to annotate valuable data for applications like autonomous driving. As humans learn, their brain updates the visual features it extracts and attend to, which ultimately informs their final classification decisions. In this work, we propose a novel task of tracing the evolving classification behavior of human learners as they engage in challenging visual classification tasks. We propose models that jointly extract the visual features used by learners as well as predicting the classification functions they utilize. We collect three challenging new datasets from real human learners in order to evaluate the performance of different visual knowledge tracing methods. Our results show that our recurrent models are able to predict the classification behavior of human learners on three challenging medical image and species identification tasks.Comment: 14 pages, 4 figures, 14 supplemental pages, 11 supplemental figures, accepted to European Conference on Computer Vision (ECCV) 202

    Visual Knowledge Tracing

    Get PDF

    Generalized Gibbs ensembles for time dependent processes

    Get PDF
    An information theory description of finite systems explicitly evolving in time is presented for classical as well as quantum mechanics. We impose a variational principle on the Shannon entropy at a given time while the constraints are set at a former time. The resulting density matrix deviates from the Boltzmann kernel and contains explicit time odd components which can be interpreted as collective flows. Applications include quantum brownian motion, linear response theory, out of equilibrium situations for which the relevant information is collected within different time scales before entropy saturation, and the dynamics of the expansion

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Stream-dashboard : a big data stream clustering framework with applications to social media streams.

    Get PDF
    Data mining is concerned with detecting patterns of data in raw datasets, which are then used to unearth knowledge that might not have been discovered using conventional querying or statistical methods. This discovered knowledge has been used to empower decision makers in countless applications spanning across many multi-disciplinary areas including business, education, astronomy, security and Information Retrieval to name a few. Many applications generate massive amounts of data continuously and at an increasing rate. This is the case for user activity over social networks such as Facebook and Twitter. This flow of data has been termed, appropriately, a Data Stream, and it introduced a set of new challenges to discover its evolving patterns using data mining techniques. Data stream clustering is concerned with detecting evolving patterns in a data stream using only the similarities between the data points as they arrive without the use of any external information (i.e. unsupervised learning). In this dissertation, we propose a complete and generic framework to simultaneously mine, track and validate clusters in a big data stream (Stream-Dashboard). The proposed framework consists of three main components: an online data stream clustering algorithm, a component for tracking and validation of pattern behavior using regression analysis, and a component that uses the behavioral information about the detected patterns to improve the quality of the clustering algorithm. As a first component, we propose RINO-Streams, an online clustering algorithm that incrementally updates the clustering model using robust statistics and incremental optimization. The second component is a methodology that we call TRACER, which continuously performs a set of statistical tests using regression analysis to track the evolution of the detected clusters, their characteristics and quality metrics. For the last component, we propose a method to build some behavioral profiles for the clustering model over time, that can be used to improve the performance of the online clustering algorithm, such as adapting the initial values of the input parameters. The performance and effectiveness of the proposed framework were validated using extensive experiments, and its use was demonstrated on a challenging real word application, specifically unsupervised mining of evolving cluster stories in one pass from the Twitter social media streams
    corecore