2,328 research outputs found

    Detecting Outliers in Data with Correlated Measures

    Full text link
    Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page

    Bayesian detection of embryonic gene expression onset in C. elegans

    Get PDF
    To study how a zygote develops into an embryo with different tissues, large-scale 4D confocal movies of C. elegans embryos have been produced recently by experimental biologists. However, the lack of principled statistical methods for the highly noisy data has hindered the comprehensive analysis of these data sets. We introduced a probabilistic change point model on the cell lineage tree to estimate the embryonic gene expression onset time. A Bayesian approach is used to fit the 4D confocal movies data to the model. Subsequent classification methods are used to decide a model selection threshold and further refine the expression onset time from the branch level to the specific cell time level. Extensive simulations have shown the high accuracy of our method. Its application on real data yields both previously known results and new findings.Comment: Published at http://dx.doi.org/10.1214/15-AOAS820 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

    Get PDF
    Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

    An Integrative Remote Sensing Application of Stacked Autoencoder for Atmospheric Correction and Cyanobacteria Estimation Using Hyperspectral Imagery

    Get PDF
    Hyperspectral image sensing can be used to effectively detect the distribution of harmful cyanobacteria. To accomplish this, physical- and/or model-based simulations have been conducted to perform an atmospheric correction (AC) and an estimation of pigments, including phycocyanin (PC) and chlorophyll-a (Chl-a), in cyanobacteria. However, such simulations were undesirable in certain cases, due to the difficulty of representing dynamically changing aerosol and water vapor in the atmosphere and the optical complexity of inland water. Thus, this study was focused on the development of a deep neural network model for AC and cyanobacteria estimation, without considering the physical formulation. The stacked autoencoder (SAE) network was adopted for the feature extraction and dimensionality reduction of hyperspectral imagery. The artificial neural network (ANN) and support vector regression (SVR) were sequentially applied to achieve AC and estimate cyanobacteria concentrations (i.e., SAE-ANN and SAE-SVR). Further, the ANN and SVR models without SAE were compared with SAE-ANN and SAE-SVR models for the performance evaluations. In terms of AC performance, both SAE-ANN and SAE-SVR displayed reasonable accuracy with the Nash???Sutcliffe efficiency (NSE) > 0.7. For PC and Chl-a estimation, the SAE-ANN model showed the best performance, by yielding NSE values > 0.79 and > 0.77, respectively. SAE, with fine tuning operators, improved the accuracy of the original ANN and SVR estimations, in terms of both AC and cyanobacteria estimation. This is primarily attributed to the high-level feature extraction of SAE, which can represent the spatial features of cyanobacteria. Therefore, this study demonstrated that the deep neural network has a strong potential to realize an integrative remote sensing application
    corecore