Search CORE

2,062 research outputs found

Density-based projected clustering of data streams

Author: Gaber M.
Hassani M.
Seidl T.
Spaus P.
Publication venue
Publication date: 01/01/2012
Field of study

Portsmouth University Research Portal (Pure)

Publikationsserver der RWTH Aachen University

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Munteanu Alexander
Schwiegelshohn Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Archivio della ricerca- Università di Roma La Sapienza

Random projections for Bayesian regression

Author: Geppert Leo N.
Ickstadt Katja
Munteanu Alexander
Quedenfeld Jens
Sohler Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2015
Field of study

This article deals with random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire

d

-dimensional distribution is approximately preserved under random projections by reducing the number of data points from

n

k\in O(\operatorname{poly}(d/\varepsilon))

in the case

n\gg d

. Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a

(1+O(\varepsilon))

-approximation in terms of the

\ell_2

Wasserstein distance. Our main result shows that the posterior distribution of Bayesian linear regression is approximated up to a small error depending on only an

\varepsilon

-fraction of its defining parameters. This holds when using arbitrary Gaussian priors or the degenerate case of uniform distributions over

\mathbb{R}^d

for

\beta

. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model up to small error while considerably reducing the total running time

arXiv.org e-Print Archive

Springer - Publisher Connector

A Fuzzy Clustering Algorithm for High Dimensional Streaming Data

Author: Jain Anurag
Jain Susheel
Upadhyay Diksha
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 01/10/2013
Field of study

In this paper we propose a dimension reduced weighted fuzzy clustering algorithm (sWFCM-HD). The algorithm can be used for high dimensional datasets having streaming behavior. Such datasets can be found in the area of sensor networks, data originated from web click stream and data collected by internet traffic flow etc. These data’s have two special properties which separate them from other datasets: a) They have streaming behavior and b) They have higher dimensions. Optimized fuzzy clustering algorithm has already been proposed for datasets having streaming behavior or higher dimensions. But as per our information, nobody has proposed any optimized fuzzy clustering algorithm for data sets having both the properties, i.e., data sets with higher dimension and also continuously arriving streaming behavior. Experimental analysis shows that our proposed algorithm (sWFCM-HD) improves performance in terms of memory consumption as well as execution time Keywords-K-Means, Fuzzy C-Means, Weighted Fuzzy C-Means, Dimension Reduction, Clustering

International Institute for Science, Technology and Education (IISTE): E-Journals