Search CORE

6,953 research outputs found

Density-based projected clustering of data streams

Author: Gaber M.
Hassani M.
Seidl T.
Spaus P.
Publication venue
Publication date: 01/01/2012
Field of study

Portsmouth University Research Portal (Pure)

Publikationsserver der RWTH Aachen University

’HALITE IND.DS’: agrupamento de dados em subespaços de séries temporais multidimensionais

Author: Cordeiro Robson Leonardo Ferreira
Silva Afonso Expedito da
Publication venue: Curitiba
Publication date
Field of study

Given a data stream with many attributes, how to cluster similar events? For example, how to cluster measurements of tens of climatic attributes to aid in forecasting the climate and extreme events? The task of clustering data with many attributes is known as subspace clustering. Today, there exists a need for algorithms of this type well-suited to process data streams. This paper proposes the new algorithm 'HALITE IND.DS' for subspace clustering in data streams. The new algorithm improves upon one existing technique, the method Halite, which was originally designed to process static datasets. Compared to using the base algorithm in data streams, the new algorithm takes advantage of the knowledge obtained from clustering past data to easy clustering data in the present, thus shrinking the runtime. Experiments using a synthetic stream, as well a real climatic stream indicate that the new algorithm is in average 4.2 times faster than the base algorithm, still obtaining similar accuracy of results.Dada uma série temporal com muitos atributos, como agrupar eventos similares? Por exemplo, como buscar grupos em medições de dezenas de atributos climáticos para previsão climática e de eventos extremos? O agrupamento de dados com muitos atributos é conhecido como agrupamento em subespaços. Há hoje uma carência de algoritmos adequados a séries temporais. Este artigo propõe o novo algoritmo 'HALITE IND.DS' para agrupamento em subespaços de séries temporais. É utilizada como base a técnica Halite, originalmente voltada à análise de dados estáticos. Em comparação ao uso do algoritmo base em séries temporais, o novo algoritmo permite que o conhecimento obtido dos dados do passado facilite o agrupamento dos dados no presente, diminuindo o tempo de análise. Experimentos em uma série sintética e em uma série climática real indicam que o novo algoritmo é em média 4,2 vezes mais rápido do que o algoritmo base, e ainda obtém acurácia similar de resultados.FAPESPCAPESCNP

Approximation and Streaming Algorithms for Projective Clustering via Random Projections

Author: Kerber Michael
Raghvendra Sharath
Publication venue
Publication date: 08/07/2014
Field of study

Let

P

be a set of

n

points in

\mathbb{R}^d

. In the projective clustering problem, given

k, q

and norm

\rho \in [1,\infty]

, we have to compute a set

\mathcal{F}

k

q

-dimensional flats such that

(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho}

is minimized; here

d(p, \mathcal{F})

represents the (Euclidean) distance of

p

to the closest flat in

\mathcal{F}

. We let

f_k^q(P,\rho)

denote the minimal value and interpret

f_k^q(P,\infty)

to be

\max_{r\in P}d(r, \mathcal{F})

. When

\rho=1,2

and

\infty

and

q=0

, the problem corresponds to the

k

-median,

k

-mean and the

k

-center clustering problems respectively. For every

0 < \epsilon < 1

S\subset P

and

\rho \ge 1

, we show that the orthogonal projection of

P

onto a randomly chosen flat of dimension

O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n)

will

\epsilon

-approximate

f_1^q(S,\rho)

. This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of

P

to an

O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n)

dimensional randomly chosen subspace

\epsilon

-approximates projective clusterings for every

k

and

\rho

simultaneously. Note that the dimension of this subspace is independent of the number of clusters~

k

. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of

n

points, we show how to compute an

\epsilon

-approximate projective clustering for every

k

and

\rho

simultaneously using only

O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n)

space. Compared to standard streaming algorithms with

\Omega(kd)

space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Munteanu Alexander
Schwiegelshohn Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Archivio della ricerca- Università di Roma La Sapienza