2,266 research outputs found
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Furniture models learned from the WWW: using web catalogs to locate and categorize unknown furniture pieces in 3D laser scans
In this article, we investigate how autonomous robots can exploit the high quality information already available from the WWW concerning 3-D models of office furniture. Apart from the hobbyist effort in Google 3-D Warehouse, many companies providing office furnishings already have the models for considerable portions of the objects found in our workplaces and homes. In particular, we present an approach that allows a robot to learn generic models of typical office furniture using examples found in the Web. These generic models are then used by the robot to locate and categorize unknown furniture in real indoor environments
Scaling associative classification for very large datasets
Supervised learning algorithms are nowadays successfully scaling up to
datasets that are very large in volume, leveraging the potential of in-memory
cluster-computing Big Data frameworks. Still, massive datasets with a number of
large-domain categorical features are a difficult challenge for any classifier.
Most off-the-shelf solutions cannot cope with this problem. In this work we
introduce DAC, a Distributed Associative Classifier. DAC exploits ensemble
learning to distribute the training of an associative classifier among parallel
workers and improve the final quality of the model. Furthermore, it adopts
several novel techniques to reach high scalability without sacrificing quality,
among which a preventive pruning of classification rules in the extraction
phase based on Gini impurity. We ran experiments on Apache Spark, on a real
large-scale dataset with more than 4 billion records and 800 million distinct
categories. The results showed that DAC improves on a state-of-the-art solution
in both prediction quality and execution time. Since the generated model is
human-readable, it can not only classify new records, but also allow
understanding both the logic behind the prediction and the properties of the
model, becoming a useful aid for decision makers
Recurrent Pixel Embedding for Instance Grouping
We introduce a differentiable, end-to-end trainable framework for solving
pixel-level grouping problems such as instance segmentation consisting of two
novel components. First, we regress pixels into a hyper-spherical embedding
space so that pixels from the same group have high cosine similarity while
those from different groups have similarity below a specified margin. We
analyze the choice of embedding dimension and margin, relating them to
theoretical results on the problem of distributing points uniformly on the
sphere. Second, to group instances, we utilize a variant of mean-shift
clustering, implemented as a recurrent neural network parameterized by kernel
bandwidth. This recurrent grouping module is differentiable, enjoys convergent
dynamics and probabilistic interpretability. Backpropagating the group-weighted
loss through this module allows learning to focus on only correcting embedding
errors that won't be resolved during subsequent clustering. Our framework,
while conceptually simple and theoretically abundant, is also practically
effective and computationally efficient. We demonstrate substantial
improvements over state-of-the-art instance segmentation for object proposal
generation, as well as demonstrating the benefits of grouping loss for
classification tasks such as boundary detection and semantic segmentation
Predicting Mental Health Crisis in Veterans: Early Warning Signs, Precursors and Protective Factors
Mental Health (MH) conditions have recently increased to a large extent due to socio-demographic changes. Posttraumatic Stress Disorder (PTSD) is one of the most common mental health disorders prevalent in US. PTSD is even more troubling at double the rate in combat veterans leaving their service compared to general population. Severity of PTSD is associated with risk taking behaviors such as substance abuse, non-suicidal self-injury, and sexual risk behaviors. Psychological disorders are often preceded by early warning signs and recognizing the early warning signs of PTSD will help in preventing the returning or worsening of PTSD symptoms. Ecological momentary assessment (EMA) studies are more sophisticated in tracking fluctuations of symptoms real-time, and they are effective in monitoring for crisis events in veterans. Mobile applications are commonly used means to gather such EMA information from participants. Our research focuses on developing interpretable machine learning (ML) models using socio-demographic data and EMA data from natural settings to predict high PTSD risk in veterans and those who engage in risky behaviors. Findings from these models can be integrated with existing m-health frameworks to generate text alerts to the mentors when the crisis patterns are observed in their mentees. Such an integrated crisis prediction and alerting system would add benefit to peer mentors to plan intervention
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
An EMG Gesture Recognition System with Flexible High-Density Sensors and Brain-Inspired High-Dimensional Classifier
EMG-based gesture recognition shows promise for human-machine interaction.
Systems are often afflicted by signal and electrode variability which degrades
performance over time. We present an end-to-end system combating this
variability using a large-area, high-density sensor array and a robust
classification algorithm. EMG electrodes are fabricated on a flexible substrate
and interfaced to a custom wireless device for 64-channel signal acquisition
and streaming. We use brain-inspired high-dimensional (HD) computing for
processing EMG features in one-shot learning. The HD algorithm is tolerant to
noise and electrode misplacement and can quickly learn from few gestures
without gradient descent or back-propagation. We achieve an average
classification accuracy of 96.64% for five gestures, with only 7% degradation
when training and testing across different days. Our system maintains this
accuracy when trained with only three trials of gestures; it also demonstrates
comparable accuracy with the state-of-the-art when trained with one trial
- …