8 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Traj-ARIMA: A Spatial-Time Series Model for Network-Constrained Trajectory
Trajectory data play an important role in analyzing real world applications that involve movement features, e.g. natural and social phenomena such as bird migration, transportation management, urban planning and tourism analysis. Such trajectory data are a special kind of time series with another focus on the spatial dimension besides the temporal one. Traditional time series models, especially the ARIMA (Auto-Regression Integrated Moving Average) model, have provided sound theoretical backgrounds and promoted many successful applications for managing and forecasting time-relevant sequential data. This paper aims at extending the ARIMA model with spatial dimension, and further applying it for the network-constrained trajectory data. We implement and evaluate the model for trajectory database, in the context of traffic application scenario about vehicle movement constrained under a given network infrastructure. The proposed Traj-ARIMA model has many application perspectives, such as trajectory data regression and compression, outliers detection, traffic flow and vehicle speed prediction. In this paper, the major focus is on vehicle speed forecasting
System Development for Detecting Outlier Transactions
魅力ある大学院教育イニシアティブ:実践IT力を備えた高度情報学人材育成プログラ
Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars
Catalogs of periodic variable stars contain large numbers of periodic
light-curves (photometric time series data from the astrophysics domain).
Separating anomalous objects from well-known classes is an important step
towards the discovery of new classes of astronomical objects. Most anomaly
detection methods for time series data assume either a single continuous time
series or a set of time series whose periods are aligned. Light-curve data
precludes the use of these methods as the periods of any given pair of
light-curves may be out of sync. One may use an existing anomaly detection
method if, prior to similarity calculation, one performs the costly act of
aligning two light-curves, an operation that scales poorly to massive data
sets. This paper presents PCAD, an unsupervised anomaly detection method for
large sets of unsynchronized periodic time-series data, that outputs a ranked
list of both global and local anomalies. It calculates its anomaly score for
each light-curve in relation to a set of centroids produced by a modified
k-means clustering algorithm. Our method is able to scale to large data sets
through the use of sampling. We validate our method on both light-curve data
and other time series data sets. We demonstrate its effectiveness at finding
known anomalies, and discuss the effect of sample size and number of centroids
on our results. We compare our method to naive solutions and existing time
series anomaly detection methods for unphased data, and show that PCAD's
reported anomalies are comparable to or better than all other methods. Finally,
astrophysicists on our team have verified that PCAD finds true anomalies that
might be indicative of novel astrophysical phenomena