6,953 research outputs found

    ’HALITE IND.DS’: agrupamento de dados em subespaços de sĂ©ries temporais multidimensionais

    Get PDF
    Given a data stream with many attributes, how to cluster similar events? For example, how to cluster measurements of tens of climatic attributes to aid in forecasting the climate and extreme events? The task of clustering data with many attributes is known as subspace clustering. Today, there exists a need for algorithms of this type well-suited to process data streams. This paper proposes the new algorithm 'HALITE IND.DS' for subspace clustering in data streams. The new algorithm improves upon one existing technique, the method Halite, which was originally designed to process static datasets. Compared to using the base algorithm in data streams, the new algorithm takes advantage of the knowledge obtained from clustering past data to easy clustering data in the present, thus shrinking the runtime. Experiments using a synthetic stream, as well a real climatic stream indicate that the new algorithm is in average 4.2 times faster than the base algorithm, still obtaining similar accuracy of results.Dada uma sĂ©rie temporal com muitos atributos, como agrupar eventos similares? Por exemplo, como buscar grupos em mediçÔes de dezenas de atributos climĂĄticos para previsĂŁo climĂĄtica e de eventos extremos? O agrupamento de dados com muitos atributos Ă© conhecido como agrupamento em subespaços. HĂĄ hoje uma carĂȘncia de algoritmos adequados a sĂ©ries temporais. Este artigo propĂ”e o novo algoritmo 'HALITE IND.DS' para agrupamento em subespaços de sĂ©ries temporais. É utilizada como base a tĂ©cnica Halite, originalmente voltada Ă  anĂĄlise de dados estĂĄticos. Em comparação ao uso do algoritmo base em sĂ©ries temporais, o novo algoritmo permite que o conhecimento obtido dos dados do passado facilite o agrupamento dos dados no presente, diminuindo o tempo de anĂĄlise. Experimentos em uma sĂ©rie sintĂ©tica e em uma sĂ©rie climĂĄtica real indicam que o novo algoritmo Ă© em mĂ©dia 4,2 vezes mais rĂĄpido do que o algoritmo base, e ainda obtĂ©m acurĂĄcia similar de resultados.FAPESPCAPESCNP

    Approximation and Streaming Algorithms for Projective Clustering via Random Projections

    Full text link
    Let PP be a set of nn points in Rd\mathbb{R}^d. In the projective clustering problem, given k,qk, q and norm ρ∈[1,∞]\rho \in [1,\infty], we have to compute a set F\mathcal{F} of kk qq-dimensional flats such that (∑p∈Pd(p,F)ρ)1/ρ(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho} is minimized; here d(p,F)d(p, \mathcal{F}) represents the (Euclidean) distance of pp to the closest flat in F\mathcal{F}. We let fkq(P,ρ)f_k^q(P,\rho) denote the minimal value and interpret fkq(P,∞)f_k^q(P,\infty) to be max⁥r∈Pd(r,F)\max_{r\in P}d(r, \mathcal{F}). When ρ=1,2\rho=1,2 and ∞\infty and q=0q=0, the problem corresponds to the kk-median, kk-mean and the kk-center clustering problems respectively. For every 0<Ï”<10 < \epsilon < 1, S⊂PS\subset P and ρ≄1\rho \ge 1, we show that the orthogonal projection of PP onto a randomly chosen flat of dimension O(((q+1)2log⁥(1/Ï”)/Ï”3)log⁥n)O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n) will Ï”\epsilon-approximate f1q(S,ρ)f_1^q(S,\rho). This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of PP to an O(((q+1)2log⁥((q+1)/Ï”)/Ï”3)log⁥n)O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n) dimensional randomly chosen subspace Ï”\epsilon-approximates projective clusterings for every kk and ρ\rho simultaneously. Note that the dimension of this subspace is independent of the number of clusters~kk. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of nn points, we show how to compute an Ï”\epsilon-approximate projective clustering for every kk and ρ\rho simultaneously using only O((n+d)((q+1)2log⁥((q+1)/Ï”))/Ï”3log⁥n)O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n) space. Compared to standard streaming algorithms with Ω(kd)\Omega(kd) space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

    Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

    Get PDF
    We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
    • 

    corecore