40,380 research outputs found
The discriminative functional mixture model for a comparative analysis of bike sharing systems
Bike sharing systems (BSSs) have become a means of sustainable intermodal
transport and are now proposed in many cities worldwide. Most BSSs also provide
open access to their data, particularly to real-time status reports on their
bike stations. The analysis of the mass of data generated by such systems is of
particular interest to BSS providers to update system structures and policies.
This work was motivated by interest in analyzing and comparing several European
BSSs to identify common operating patterns in BSSs and to propose practical
solutions to avoid potential issues. Our approach relies on the identification
of common patterns between and within systems. To this end, a model-based
clustering method, called FunFEM, for time series (or more generally functional
data) is developed. It is based on a functional mixture model that allows the
clustering of the data in a discriminative functional subspace. This model
presents the advantage in this context to be parsimonious and to allow the
visualization of the clustered systems. Numerical experiments confirm the good
behavior of FunFEM, particularly compared to state-of-the-art methods. The
application of FunFEM to BSS data from JCDecaux and the Transport for London
Initiative allows us to identify 10 general patterns, including pathological
ones, and to propose practical improvement strategies based on the system
comparison. The visualization of the clustered data within the discriminative
subspace turns out to be particularly informative regarding the system
efficiency. The proposed methodology is implemented in a package for the R
software, named funFEM, which is available on the CRAN. The package also
provides a subset of the data analyzed in this work.Comment: Published at http://dx.doi.org/10.1214/15-AOAS861 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data
In systems biomedicine, an experimenter encounters different potential
sources of variation in data such as individual samples, multiple experimental
conditions, and multi-variable network-level responses. In multiparametric
cytometry, which is often used for analyzing patient samples, such issues are
critical. While computational methods can identify cell populations in
individual samples, without the ability to automatically match them across
samples, it is difficult to compare and characterize the populations in typical
experiments, such as those responding to various stimulations or distinctive of
particular patients or time-points, especially when there are many samples.
Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous
modeling and registration of populations across a cohort. JCM models every
population with a robust multivariate probability distribution. Simultaneously,
JCM fits a random-effects model to construct an overall batch template -- used
for registering populations across samples, and classifying new samples. By
tackling systems-level variation, JCM supports practical biomedical
applications involving large cohorts
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
A novel algorithm and implementation of real-time identification and tracking
of blob-filaments in fusion reactor data is presented. Similar spatio-temporal
features are important in many other applications, for example, ignition
kernels in combustion and tumor cells in a medical image. This work presents an
approach for extracting these features by dividing the overall task into three
steps: local identification of feature cells, grouping feature cells into
extended feature, and tracking movement of feature through overlapping in
space. Through our extensive work in parallelization, we demonstrate that this
approach can effectively make use of a large number of compute nodes to detect
and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion
simulation data, we observed linear speedup on 1024 processes and completed
blob detection in less than three milliseconds using Edison, a Cray XC30 system
at NERSC.Comment: 14 pages, 40 figure
- …