1,180 research outputs found
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Semi-supervised and Active-learning Scenarios: Efficient Acoustic Model Refinement for a Low Resource Indian Language
We address the problem of efficient acoustic-model refinement (continuous
retraining) using semi-supervised and active learning for a low resource Indian
language, wherein the low resource constraints are having i) a small labeled
corpus from which to train a baseline `seed' acoustic model and ii) a large
training corpus without orthographic labeling or from which to perform a data
selection for manual labeling at low costs. The proposed semi-supervised
learning decodes the unlabeled large training corpus using the seed model and
through various protocols, selects the decoded utterances with high reliability
using confidence levels (that correlate to the WER of the decoded utterances)
and iterative bootstrapping. The proposed active learning protocol uses
confidence level based metric to select the decoded utterances from the large
unlabeled corpus for further labeling. The semi-supervised learning protocols
can offer a WER reduction, from a poorly trained seed model, by as much as 50%
of the best WER-reduction realizable from the seed model's WER, if the large
corpus were labeled and used for acoustic-model training. The active learning
protocols allow that only 60% of the entire training corpus be manually
labeled, to reach the same performance as the entire data
Learning Articulated Motions From Visual Demonstration
Many functional elements of human homes and workplaces consist of rigid
components which are connected through one or more sliding or rotating
linkages. Examples include doors and drawers of cabinets and appliances;
laptops; and swivel office chairs. A robotic mobile manipulator would benefit
from the ability to acquire kinematic models of such objects from observation.
This paper describes a method by which a robot can acquire an object model by
capturing depth imagery of the object as a human moves it through its range of
motion. We envision that in future, a machine newly introduced to an
environment could be shown by its human user the articulated objects particular
to that environment, inferring from these "visual demonstrations" enough
information to actuate each object independently of the user.
Our method employs sparse (markerless) feature tracking, motion segmentation,
component pose estimation, and articulation learning; it does not require prior
object models. Using the method, a robot can observe an object being exercised,
infer a kinematic model incorporating rigid, prismatic and revolute joints,
then use the model to predict the object's motion from a novel vantage point.
We evaluate the method's performance, and compare it to that of a previously
published technique, for a variety of household objects.Comment: Published in Robotics: Science and Systems X, Berkeley, CA. ISBN:
978-0-9923747-0-
A computational framework for unsupervised analysis of everyday human activities
In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner.
A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity.
Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify
a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.Ph.D.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Reh
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
- …