6,979 research outputs found
Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection
In recent years, there have been many practical applications of anomaly
detection such as in predictive maintenance, detection of credit fraud, network
intrusion, and system failure. The goal of anomaly detection is to identify in
the test data anomalous behaviors that are either rare or unseen in the
training data. This is a common goal in predictive maintenance, which aims to
forecast the imminent faults of an appliance given abundant samples of normal
behaviors. Local outlier factor (LOF) is one of the state-of-the-art models
used for anomaly detection, but the predictive performance of LOF depends
greatly on the selection of hyperparameters. In this paper, we propose a novel,
heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model
that uses the proposed method shows good predictive performance in both
simulations and real data sets.Comment: 15 pages, 5 figure
Randomizing Ensemble-based approaches for Outlier
The data size is increasing dramatically every day, therefore, it has emerged the need of detecting abnormal behaviors, which can harm seriously our systems. Outlier detection refers to the process of identifying outlying activities, which diverge from the remaining group of data. This process, an integral part of data mining field, has experienced recently a substantial interest from the data mining community. An outlying activity or an outlier refers to a data point, which significantly deviates and appears to be inconsistent compared to other data members. Ensemble-based outlier detection is a line of research employed in order to reduce the model dependence from datasets or data locality by raising the robustness of the data mining procedures. The key principle of an ensemble approach is using the combination of individual detection results, which do not contain the same list of outliers in order to come up with a consensus finding. In this paper, we propose a novel strategy of constructing randomized ensemble outlier detection. This approach is an extension of the heuristic greedy ensemble construction previously built by the research community. We will focus on the core components of constructing an ensemble –based algorithm for outlier detection. The randomization will be performed by intervening into the pseudo code of greedy ensemble and implementing randomization in the respective java code through the ELKI data-mining platform. The key purpose of our approach is to improve the greedy ensemble and to overcome its local maxima problem. In order to induce diversity, it is performed randomization by initializing the search with a random outlier detector from the pool of detectors. Finally, the paper provides strong insights regarding the ongoing work of our randomized ensemble-based approach for outlier detection. Empirical results indicate that due to inducing diversity by employing various outlier detection algorithms, the randomized ensemble approach performs better than using only one outlier detector
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
In-Network Outlier Detection in Wireless Sensor Networks
To address the problem of unsupervised outlier detection in wireless sensor
networks, we develop an approach that (1) is flexible with respect to the
outlier definition, (2) computes the result in-network to reduce both bandwidth
and energy usage,(3) only uses single hop communication thus permitting very
simple node failure detection and message reliability assurance mechanisms
(e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data.
We examine performance using simulation with real sensor data streams. Our
results demonstrate that our approach is accurate and imposes a reasonable
communication load and level of power consumption.Comment: Extended version of a paper appearing in the Int'l Conference on
Distributed Computing Systems 200
Automatic Bayesian Density Analysis
Making sense of a dataset in an automatic and unsupervised fashion is a
challenging problem in statistics and AI. Classical approaches for {exploratory
data analysis} are usually not flexible enough to deal with the uncertainty
inherent to real-world data: they are often restricted to fixed latent
interaction models and homogeneous likelihoods; they are sensitive to missing,
corrupt and anomalous data; moreover, their expressiveness generally comes at
the price of intractable inference. As a result, supervision from statisticians
is usually needed to find the right model for the data. However, since domain
experts are not necessarily also experts in statistics, we propose Automatic
Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible
at large. Specifically, ABDA allows for automatic and efficient missing value
estimation, statistical data type and likelihood discovery, anomaly detection
and dependency structure mining, on top of providing accurate density
estimation. Extensive empirical evidence shows that ABDA is a suitable tool for
automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
- …