Search CORE

56,078 research outputs found

RIDGE LEAST ABSOLUTE DEVIATION PERFORMANCE IN ADDRESSING MULTICOLLINEARITY AND DIFFERENT LEVELS OF OUTLIER SIMULTANEOUSLY

Author: Azis Dorrah
Herawati Netti
Saidi Subian
Publication venue: 'Universitas Pattimura'
Publication date: 01/09/2022
Field of study

If there is multicollinearity and outliers in the data, the inference about parameter estimation in the LS method will deviate due to the inefficiency of this method in estimating. To overcome these two problems simultaneously, it can be done using robust regression, one of which is ridge least absolute deviation method. This study aims to evaluate the performance of the ridge least absolute deviation method in surmounting multicollinearity in divers sample sizes and percentage of outliers using simulation data. The Monte Carlo study was designed in a multiple regression model with multicollinearity (ρ=0.99) between variables  and  and outliers 10%, 20%, 30% on response variables with different sample sizes (n = 25, 50,75,100,200; =0, and β=1 otherwise). The existence of multicollinearity in the data is done by calculating the correlation value between the independent variables and the VIF value. Outlier detection is done by using boxplot. Parameter estimation was carried out using the RLAD and LS methods. Furthermore, a comparison of the MSE values of the two methods is carried out to see which method is better in overcoming multicollinearity and outliers. The results showed that RLAD had a lower MSE than LS. This signifies that RLAD is more precise in estimating the regression coefficients for each sample size and various outlier levels studied

OJS UNPATTI Publication Center (Universitas Pattimura)

Outlier Detection Using Nonconvex Penalized Regression

Author: Art B. Owen
Benjamini Y.
Hadi A. S.
Peña D.
Rousseeuw P.
Yiyuan She
Zhao P.
Publication venue
Publication date: 01/01/2010
Field of study

This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the

n

data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual

L_1

penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The

L_1

penalty corresponds to soft thresholding. We introduce a thresholding (denoted by

\Theta

) based iterative procedure for outlier detection (

\Theta

-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that

\Theta

-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most

O(np)

(and sometimes much less) avoiding an

O(np^2)

least squares estimate. We describe the connection between

\Theta

-IPOD and

M

-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned

\Theta

-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with

p\gg n

, if both the coefficient vector and the outlier pattern are sparse

arXiv.org e-Print Archive

CiteSeerX

Crossref

Research Papers in Economics

Detecting Outliers in Data with Correlated Measures

Author: Kifer Daniel
Kuo Yu-Hsuan
Li Zhenhui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/08/2018
Field of study

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page

arXiv.org e-Print Archive

Crossref

A Parametric Framework for the Comparison of Methods of Very Robust Regression

Author: Atkinson Anthony C.
Perrotta Domenico
Riani Marco
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50% of outliers in the data. Differences in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter

\lambda

that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of

\lambda

, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Crossref

Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

Author: Alsheikh Mohammad Abu
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

University of Canberra Research Repository

Data-driven Soft Sensors in the Process Industry

Author: Abdi
Alhoniemi
Angelov
Angelov
Angelov
Arazo-Bravo
Atkeson
Bastin
Bauer
Bishop
Bogdan Gabrys
Bonne
Breiman
Bro
Casali
Chen
Chen
Chen
Chen
Chen
Choi
Chruy
Davies
Dayal
De Wolf
Desai
Devogelaere
Ding
Dong
Dong
Dote
Doyle
Dunia
Dunia
Dunia
Eriksson
Fellner
Fortuna
Fortuna
Frank
Freund
Funahashi
Gabrielsson
Gabrys
Gabrys
Gabrys
Gabrys
Gama
Geladi
Gomez
Gonzalez
Gonzalez
Goodwin
Gosset
Guyon
Han
Hastie
He
Hodge
Hotelling
Jackson
James
Jang
Jiang
Jolliffe
Jordaan
Jos de Assis
Kadlec
Kadlec
Kalos
Kampjarvi
Kittler
Kohavi
Kohonen
Kordon
Kourti
Kourti
Krogh
Kuncheva
Lee
Lee
Lee
Lee
Li
Li
Lin
Lin
Luo
Macias
Mandic
Marjanovic
Meleiro
Menold
Nauck
Neogi
Nomikos
Nomikos
Nomikos
Opitz
Park
Pearson
Pearson
Petr Kadlec
Poggio
Prasad
Principe
Qin
Qin
Qin
Qin
Radhakrishnan
Rnnar
Rong
Rotem
Ruta
Ruta
Schafer
Scheffer
Serneels
Sibylle Strandt
Stanimirova
Su
Tzanakou
van Sprang
van Sprang
Vapnik
Venkatasubramanian
Venkatasubramanian
Venkatasubramanian
Vilalta
Walczak
Walczak
Walczak
Wang
Wang
Wang
Wang
Warne
Weiss
Widmer
Wold
Wold
Wold
Wolpert
Yan
Yang
Zadeh
Zamprogna
Zamprogna
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/04/2009
Field of study

In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

Crossref

Bournemouth University Research Online