2,328 research outputs found
Detecting Outliers in Data with Correlated Measures
Advances in sensor technology have enabled the collection of large-scale
datasets. Such datasets can be extremely noisy and often contain a significant
amount of outliers that result from sensor malfunction or human operation
faults. In order to utilize such data for real-world applications, it is
critical to detect outliers so that models built from these datasets will not
be skewed by outliers.
In this paper, we propose a new outlier detection method that utilizes the
correlations in the data (e.g., taxi trip distance vs. trip time). Different
from existing outlier detection methods, we build a robust regression model
that explicitly models the outliers and detects outliers simultaneously with
the model fitting.
We validate our approach on real-world datasets against methods specifically
designed for each dataset as well as the state of the art outlier detectors.
Our outlier detection method achieves better performances, demonstrating the
robustness and generality of our method. Last, we report interesting case
studies on some outliers that result from atypical events.Comment: 10 page
Bayesian detection of embryonic gene expression onset in C. elegans
To study how a zygote develops into an embryo with different tissues,
large-scale 4D confocal movies of C. elegans embryos have been produced
recently by experimental biologists. However, the lack of principled
statistical methods for the highly noisy data has hindered the comprehensive
analysis of these data sets. We introduced a probabilistic change point model
on the cell lineage tree to estimate the embryonic gene expression onset time.
A Bayesian approach is used to fit the 4D confocal movies data to the model.
Subsequent classification methods are used to decide a model selection
threshold and further refine the expression onset time from the branch level to
the specific cell time level. Extensive simulations have shown the high
accuracy of our method. Its application on real data yields both previously
known results and new findings.Comment: Published at http://dx.doi.org/10.1214/15-AOAS820 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
An Integrative Remote Sensing Application of Stacked Autoencoder for Atmospheric Correction and Cyanobacteria Estimation Using Hyperspectral Imagery
Hyperspectral image sensing can be used to effectively detect the distribution of harmful cyanobacteria. To accomplish this, physical- and/or model-based simulations have been conducted to perform an atmospheric correction (AC) and an estimation of pigments, including phycocyanin (PC) and chlorophyll-a (Chl-a), in cyanobacteria. However, such simulations were undesirable in certain cases, due to the difficulty of representing dynamically changing aerosol and water vapor in the atmosphere and the optical complexity of inland water. Thus, this study was focused on the development of a deep neural network model for AC and cyanobacteria estimation, without considering the physical formulation. The stacked autoencoder (SAE) network was adopted for the feature extraction and dimensionality reduction of hyperspectral imagery. The artificial neural network (ANN) and support vector regression (SVR) were sequentially applied to achieve AC and estimate cyanobacteria concentrations (i.e., SAE-ANN and SAE-SVR). Further, the ANN and SVR models without SAE were compared with SAE-ANN and SAE-SVR models for the performance evaluations. In terms of AC performance, both SAE-ANN and SAE-SVR displayed reasonable accuracy with the Nash???Sutcliffe efficiency (NSE) > 0.7. For PC and Chl-a estimation, the SAE-ANN model showed the best performance, by yielding NSE values > 0.79 and > 0.77, respectively. SAE, with fine tuning operators, improved the accuracy of the original ANN and SVR estimations, in terms of both AC and cyanobacteria estimation. This is primarily attributed to the high-level feature extraction of SAE, which can represent the spatial features of cyanobacteria. Therefore, this study demonstrated that the deep neural network has a strong potential to realize an integrative remote sensing application
- …