8,468 research outputs found
Novelty Detection in Sequential Data by Informed Clustering and Modeling
Novelty detection in discrete sequences is a challenging task, since
deviations from the process generating the normal data are often small or
intentionally hidden. Novelties can be detected by modeling normal sequences
and measuring the deviations of a new sequence from the model predictions.
However, in many applications data is generated by several distinct processes
so that models trained on all the data tend to over-generalize and novelties
remain undetected. We propose to approach this challenge through decomposition:
by clustering the data we break down the problem, obtaining simpler modeling
task in each cluster which can be modeled more accurately. However, this comes
at a trade-off, since the amount of training data per cluster is reduced. This
is a particular problem for discrete sequences where state-of-the-art models
are data-hungry. The success of this approach thus depends on the quality of
the clustering, i.e., whether the individual learning problems are sufficiently
simpler than the joint problem. While clustering discrete sequences
automatically is a challenging and domain-specific task, it is often easy for
human domain experts, given the right tools. In this paper, we adapt a
state-of-the-art visual analytics tool for discrete sequence clustering to
obtain informed clusters from domain experts and use LSTMs to model each
cluster individually. Our extensive empirical evaluation indicates that this
informed clustering outperforms automatic ones and that our approach
outperforms state-of-the-art novelty detection methods for discrete sequences
in three real-world application scenarios. In particular, decomposition
outperforms a global model despite less training data on each individual
cluster
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Anomaly Detection on Time Series Data
Anomaly detection is an important problem that has been researched within diverse application domains. Detection of anomalies in the time series domain finds extensive application in monitoring system status, mal-ware/spam detection, credit-card fraud etc. In this work we explore methods to detect anomalies in multivariate as well as uni variate time-series and proposed a novel method using Dictionary Learning, Sparse Representation, Singular Value Decomposition and Topological anomaly detection(TAD). We have tested the proposed method on real as well as synthetic data sets. Our novel method brings down the false positive rates as compared to the existing methods
Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos
In this paper, a content-aware approach is proposed to design multiple test conditions for shot cut detection, which are organized into a multiple phase decision tree for abrupt cut detection and a finite state machine for dissolve detection. In comparison with existing approaches, our algorithm is characterized with two categories of content difference indicators and testing. While the first category indicates the content changes that are directly used for shot cut detection, the second category indicates the contexts under which the content change occurs. As a result, indications of frame differences are tested with context awareness to make the detection of shot cuts adaptive to both content and context changes. Evaluations announced by TRECVID 2007 indicate that our proposed algorithm achieved comparable performance to those using machine learning approaches, yet using a simpler feature set and straightforward design strategies. This has validated the effectiveness of modelling of content-aware indicators for decision making, which also provides a good alternative to conventional approaches in this topic
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Intrinsic Motivation Systems for Autonomous Mental Development
Exploratory activities seem to be intrinsically rewarding
for children and crucial for their cognitive development.
Can a machine be endowed with such an intrinsic motivation
system? This is the question we study in this paper, presenting a number of computational systems that try to capture this drive towards novel or curious situations. After discussing related research coming from developmental psychology, neuroscience, developmental robotics, and active learning, this paper presents the mechanism of Intelligent Adaptive Curiosity, an intrinsic motivation system which pushes a robot towards situations in which it maximizes its learning progress. This drive makes the robot focus on situations which are neither too predictable nor too unpredictable, thus permitting autonomous mental development.The complexity of the robot’s activities autonomously increases and complex developmental sequences self-organize without being constructed in a supervised manner. Two experiments are presented illustrating the stage-like organization emerging with this mechanism. In one of them, a physical robot is placed on a baby play mat with objects that it can learn to manipulate. Experimental results show that the robot first spends time in situations
which are easy to learn, then shifts its attention progressively to situations of increasing difficulty, avoiding situations in which nothing can be learned. Finally, these various results are discussed in relation to more complex forms of behavioral organization and data coming from developmental psychology.
Key words: Active learning, autonomy, behavior, complexity,
curiosity, development, developmental trajectory, epigenetic
robotics, intrinsic motivation, learning, reinforcement learning,
values
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
- …