15,805 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
A novel algorithm and implementation of real-time identification and tracking
of blob-filaments in fusion reactor data is presented. Similar spatio-temporal
features are important in many other applications, for example, ignition
kernels in combustion and tumor cells in a medical image. This work presents an
approach for extracting these features by dividing the overall task into three
steps: local identification of feature cells, grouping feature cells into
extended feature, and tracking movement of feature through overlapping in
space. Through our extensive work in parallelization, we demonstrate that this
approach can effectively make use of a large number of compute nodes to detect
and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion
simulation data, we observed linear speedup on 1024 processes and completed
blob detection in less than three milliseconds using Edison, a Cray XC30 system
at NERSC.Comment: 14 pages, 40 figure
Revisiting Multi-Subject Random Effects in fMRI: Advocating Prevalence Estimation
Random Effects analysis has been introduced into fMRI research in order to
generalize findings from the study group to the whole population. Generalizing
findings is obviously harder than detecting activation in the study group since
in order to be significant, an activation has to be larger than the
inter-subject variability. Indeed, detected regions are smaller when using
random effect analysis versus fixed effects. The statistical assumptions behind
the classic random effects model are that the effect in each location is
normally distributed over subjects, and "activation" refers to a non-null mean
effect. We argue this model is unrealistic compared to the true population
variability, where, due to functional plasticity and registration anomalies, at
each brain location some of the subjects are active and some are not. We
propose a finite-Gaussian--mixture--random-effect. A model that amortizes
between-subject spatial disagreement and quantifies it using the "prevalence"
of activation at each location. This measure has several desirable properties:
(a) It is more informative than the typical active/inactive paradigm. (b) In
contrast to the hypothesis testing approach (thus t-maps) which are trivially
rejected for large sample sizes, the larger the sample size, the more
informative the prevalence statistic becomes.
In this work we present a formal definition and an estimation procedure of
this prevalence. The end result of the proposed analysis is a map of the
prevalence at locations with significant activation, highlighting activations
regions that are common over many brains
- …