4,484 research outputs found
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Clustering of discretely observed diffusion processes
In this paper a new dissimilarity measure to identify groups of assets
dynamics is proposed. The underlying generating process is assumed to be a
diffusion process solution of stochastic differential equations and observed at
discrete time. The mesh of observations is not required to shrink to zero. As
distance between two observed paths, the quadratic distance of the
corresponding estimated Markov operators is considered. Analysis of both
synthetic data and real financial data from NYSE/NASDAQ stocks, give evidence
that this distance seems capable to catch differences in both the drift and
diffusion coefficients contrary to other commonly used metrics
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Detection of Thin Boundaries between Different Types of Anomalies in Outlier Detection using Enhanced Neural Networks
Outlier detection has received special attention in various fields, mainly
for those dealing with machine learning and artificial intelligence. As strong
outliers, anomalies are divided into the point, contextual and collective
outliers. The most important challenges in outlier detection include the thin
boundary between the remote points and natural area, the tendency of new data
and noise to mimic the real data, unlabelled datasets and different definitions
for outliers in different applications. Considering the stated challenges, we
defined new types of anomalies called Collective Normal Anomaly and Collective
Point Anomaly in order to improve a much better detection of the thin boundary
between different types of anomalies. Basic domain-independent methods are
introduced to detect these defined anomalies in both unsupervised and
supervised datasets. The Multi-Layer Perceptron Neural Network is enhanced
using the Genetic Algorithm to detect newly defined anomalies with higher
precision so as to ensure a test error less than that calculated for the
conventional Multi-Layer Perceptron Neural Network. Experimental results on
benchmark datasets indicated reduced error of anomaly detection process in
comparison to baselines
A Relational Hyperlink Analysis of an Online Social Movement
In this paper we propose relational hyperlink analysis (RHA) as a distinct approach for empirical social science research into hyperlink networks on the World Wide Web. We demonstrate this approach, which employs the ideas and techniques of social network analysis (in particular, exponential random graph modeling), in a study of the hyperlinking behaviors of Australian asylum advocacy groups. We show that compared with the commonly-used hyperlink counts regression approach, relational hyperlink analysis can lead to fundamentally different conclusions about the social processes underpinning hyperlinking behavior. In particular, in trying to understand why social ties are formed, counts regressions may over-estimate the role of actor attributes in the formation of hyperlinks when endogenous, purely structural network effects are not taken into account. Our analysis involves an innovative joint use of two software programs: VOSON, for the automated retrieval and processing of considerable quantities of hyperlink data, and LPNet, for the statistical modeling of social network data. Together, VOSON and LPNet enable new and unique research into social networks in the online world, and our paper highlights the importance of complementary research tools for social science research into the web
- …