56,078 research outputs found
RIDGE LEAST ABSOLUTE DEVIATION PERFORMANCE IN ADDRESSING MULTICOLLINEARITY AND DIFFERENT LEVELS OF OUTLIER SIMULTANEOUSLY
If there is multicollinearity and outliers in the data, the inference about parameter estimation in the LS method will deviate due to the inefficiency of this method in estimating. To overcome these two problems simultaneously, it can be done using robust regression, one of which is ridge least absolute deviation method. This study aims to evaluate the performance of the ridge least absolute deviation method in surmounting multicollinearity in divers sample sizes and percentage of outliers using simulation data. The Monte Carlo study was designed in a multiple regression model with multicollinearity (Ļ=0.99) between variables and and outliers 10%, 20%, 30% on response variables with different sample sizes (n = 25, 50,75,100,200; =0, and Ī²=1 otherwise). The existence of multicollinearity in the data is done by calculating the correlation value between the independent variables and the VIF value. Outlier detection is done by using boxplot. Parameter estimation was carried out using the RLAD and LS methods. Furthermore, a comparison of the MSE values of the two methods is carried out to see which method is better in overcoming multicollinearity and outliers. The results showed that RLAD had a lower MSE than LS. This signifies that RLAD is more precise in estimating the regression coefficients for each sample size and various outlier levels studied
Outlier Detection Using Nonconvex Penalized Regression
This paper studies the outlier detection problem from the point of view of
penalized regressions. Our regression model adds one mean shift parameter for
each of the data points. We then apply a regularization favoring a sparse
vector of mean shift parameters. The usual penalty yields a convex
criterion, but we find that it fails to deliver a robust estimator. The
penalty corresponds to soft thresholding. We introduce a thresholding (denoted
by ) based iterative procedure for outlier detection (-IPOD). A
version based on hard thresholding correctly identifies outliers on some hard
test problems. We find that -IPOD is much faster than iteratively
reweighted least squares for large data because each iteration costs at most
(and sometimes much less) avoiding an least squares estimate.
We describe the connection between -IPOD and -estimators. Our
proposed method has one tuning parameter with which to both identify outliers
and estimate regression coefficients. A data-dependent choice can be made based
on BIC. The tuned -IPOD shows outstanding performance in identifying
outliers in various situations in comparison to other existing approaches. This
methodology extends to high-dimensional modeling with , if both the
coefficient vector and the outlier pattern are sparse
Detecting Outliers in Data with Correlated Measures
Advances in sensor technology have enabled the collection of large-scale
datasets. Such datasets can be extremely noisy and often contain a significant
amount of outliers that result from sensor malfunction or human operation
faults. In order to utilize such data for real-world applications, it is
critical to detect outliers so that models built from these datasets will not
be skewed by outliers.
In this paper, we propose a new outlier detection method that utilizes the
correlations in the data (e.g., taxi trip distance vs. trip time). Different
from existing outlier detection methods, we build a robust regression model
that explicitly models the outliers and detects outliers simultaneously with
the model fitting.
We validate our approach on real-world datasets against methods specifically
designed for each dataset as well as the state of the art outlier detectors.
Our outlier detection method achieves better performances, demonstrating the
robustness and generality of our method. Last, we report interesting case
studies on some outliers that result from atypical events.Comment: 10 page
A Parametric Framework for the Comparison of Methods of Very Robust Regression
There are several methods for obtaining very robust estimates of regression
parameters that asymptotically resist 50% of outliers in the data. Differences
in the behaviour of these algorithms depend on the distance between the
regression data and the outliers. We introduce a parameter that
defines a parametric path in the space of models and enables us to study, in a
systematic way, the properties of estimators as the groups of data move from
being far apart to close together. We examine, as a function of , the
variance and squared bias of five estimators and we also consider their power
when used in the detection of outliers. This systematic approach provides tools
for gaining knowledge and better understanding of the properties of robust
estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
Data-driven Soft Sensors in the Process Industry
In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work
- ā¦