Search CORE

2,328 research outputs found

Detecting Outliers in Data with Correlated Measures

Author: Kifer Daniel
Kuo Yu-Hsuan
Li Zhenhui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/08/2018
Field of study

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Unsupervised Learning for Online AbnormalityDetection in Smart Meter Data

Author: Aligholian Armin
Farajollahi Mohammad
Mohsenian-Rad Hamed
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Crossref

eScholarship - University of California

Bayesian detection of embryonic gene expression onset in C. elegans

Author: Fan Xiaodan
Hu Jie
Wang Junwen
Yalamanchili Hari Krishna
Ye Kenny
Zhao Zhongying
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2015
Field of study

To study how a zygote develops into an embryo with different tissues, large-scale 4D confocal movies of C. elegans embryos have been produced recently by experimental biologists. However, the lack of principled statistical methods for the highly noisy data has hindered the comprehensive analysis of these data sets. We introduced a probabilistic change point model on the cell lineage tree to estimate the embryonic gene expression onset time. A Bayesian approach is used to fit the 4D confocal movies data to the model. Subsequent classification methods are used to decide a model selection threshold and further refine the expression onset time from the branch level to the specific cell time level. Extensive simulations have shown the high accuracy of our method. Its application on real data yields both previously known results and new findings.Comment: Published at http://dx.doi.org/10.1214/15-AOAS820 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

HKU Scholars Hub

Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

Author: Alsheikh Mohammad Abu
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2014
Field of study

Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

University of Canberra Research Repository

An Integrative Remote Sensing Application of Stacked Autoencoder for Atmospheric Correction and Cyanobacteria Estimation Using Hyperspectral Imagery

Author: Baek Sangsoo
Cha YoonKyung
Cho Kyung Hwa
Duan Hongtao
Kang Taegu
Kim Kyunghyun
Kim Minjeong
Kwon Yong Sung
Lee Hyuk
Ligaray Mayzonee
Pyo JongCheol
Publication venue: 'MDPI AG'
Publication date: 01/03/2020
Field of study

Hyperspectral image sensing can be used to effectively detect the distribution of harmful cyanobacteria. To accomplish this, physical- and/or model-based simulations have been conducted to perform an atmospheric correction (AC) and an estimation of pigments, including phycocyanin (PC) and chlorophyll-a (Chl-a), in cyanobacteria. However, such simulations were undesirable in certain cases, due to the difficulty of representing dynamically changing aerosol and water vapor in the atmosphere and the optical complexity of inland water. Thus, this study was focused on the development of a deep neural network model for AC and cyanobacteria estimation, without considering the physical formulation. The stacked autoencoder (SAE) network was adopted for the feature extraction and dimensionality reduction of hyperspectral imagery. The artificial neural network (ANN) and support vector regression (SVR) were sequentially applied to achieve AC and estimate cyanobacteria concentrations (i.e., SAE-ANN and SAE-SVR). Further, the ANN and SVR models without SAE were compared with SAE-ANN and SAE-SVR models for the performance evaluations. In terms of AC performance, both SAE-ANN and SAE-SVR displayed reasonable accuracy with the Nash???Sutcliffe efficiency (NSE) > 0.7. For PC and Chl-a estimation, the SAE-ANN model showed the best performance, by yielding NSE values > 0.79 and > 0.77, respectively. SAE, with fine tuning operators, improved the accuracy of the original ANN and SVR estimations, in terms of both AC and cyanobacteria estimation. This is primarily attributed to the high-level feature extraction of SAE, which can represent the spatial features of cyanobacteria. Therefore, this study demonstrated that the deep neural network has a strong potential to realize an integrative remote sensing application

Multidisciplinary Digital Publishing Institute

ScholarWorks@UNIST