Search CORE

1,161 research outputs found

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

Visual Knowledge Tracing

Author: Kondapaneni Neehar
Mac Aodha Oisin
Perona Pietro
Publication venue
Publication date: 21/07/2022
Field of study

Each year, thousands of people learn new visual categorization tasks -- radiologists learn to recognize tumors, birdwatchers learn to distinguish similar species, and crowd workers learn how to annotate valuable data for applications like autonomous driving. As humans learn, their brain updates the visual features it extracts and attend to, which ultimately informs their final classification decisions. In this work, we propose a novel task of tracing the evolving classification behavior of human learners as they engage in challenging visual classification tasks. We propose models that jointly extract the visual features used by learners as well as predicting the classification functions they utilize. We collect three challenging new datasets from real human learners in order to evaluate the performance of different visual knowledge tracing methods. Our results show that our recurrent models are able to predict the classification behavior of human learners on three challenging medical image and species identification tasks.Comment: 14 pages, 4 figures, 14 supplemental pages, 11 supplemental figures, accepted to European Conference on Computer Vision (ECCV) 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

Visual Knowledge Tracing

Author: Kondapaneni Neehar
Mac Aodha Oisin
Perona Pietro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/10/2022
Field of study

Edinburgh Research Explorer

Generalized Gibbs ensembles for time dependent processes

Author: Balian
Bondorf
Braun-Munzinger
Brechignac
Caldeira
Challa
Chomaz
Das Gupta
Ellis
F. Gulminelli
Fisher
Frascaria
Gulminelli
Gulminelli
Hill
Johal
Menotti
Minguzzi
O. Juillet
Ph. Chomaz
Rasetti
Reimann
Richert
Ring
Schmidt
Shuryak
Thirring
Tolman
Publication venue: 'Elsevier BV'
Publication date: 17/12/2004
Field of study

An information theory description of finite systems explicitly evolving in time is presented for classical as well as quantum mechanics. We impose a variational principle on the Shannon entropy at a given time while the constraints are set at a former time. The resulting density matrix deviates from the Boltzmann kernel and contains explicit time odd components which can be interpreted as collective flows. Applications include quantum brownian motion, linear response theory, out of equilibrium situations for which the relevant information is collected within different time scales before entropy saturation, and the dynamics of the expansion

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

HAL-IN2P3

HAL-CEA

Graph based Anomaly Detection and Description: A Survey

Author: Danai Koutra
Hanghang Tong
Leman Akoglu
Publication venue
Publication date: 28/04/2014
Field of study

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

arXiv.org e-Print Archive

CiteSeerX

Stream-dashboard : a big data stream clustering framework with applications to social media streams.

Author: Hawwash Basheer, 1984-
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2013
Field of study

Data mining is concerned with detecting patterns of data in raw datasets, which are then used to unearth knowledge that might not have been discovered using conventional querying or statistical methods. This discovered knowledge has been used to empower decision makers in countless applications spanning across many multi-disciplinary areas including business, education, astronomy, security and Information Retrieval to name a few. Many applications generate massive amounts of data continuously and at an increasing rate. This is the case for user activity over social networks such as Facebook and Twitter. This flow of data has been termed, appropriately, a Data Stream, and it introduced a set of new challenges to discover its evolving patterns using data mining techniques. Data stream clustering is concerned with detecting evolving patterns in a data stream using only the similarities between the data points as they arrive without the use of any external information (i.e. unsupervised learning). In this dissertation, we propose a complete and generic framework to simultaneously mine, track and validate clusters in a big data stream (Stream-Dashboard). The proposed framework consists of three main components: an online data stream clustering algorithm, a component for tracking and validation of pattern behavior using regression analysis, and a component that uses the behavioral information about the detected patterns to improve the quality of the clustering algorithm. As a first component, we propose RINO-Streams, an online clustering algorithm that incrementally updates the clustering model using robust statistics and incremental optimization. The second component is a methodology that we call TRACER, which continuously performs a set of statistical tests using regression analysis to track the evolution of the detected clusters, their characteristics and quality metrics. For the last component, we propose a method to build some behavioral profiles for the clustering model over time, that can be used to improve the performance of the online clustering algorithm, such as adapting the initial values of the input parameters. The performance and effectiveness of the proposed framework were validated using extensive experiments, and its use was demonstrated on a challenging real word application, specifically unsupervised mining of evolving cluster stories in one pass from the Twitter social media streams

University of Louisville