Search CORE

11,076 research outputs found

Improved Algorithms for Time Decay Streams

Author: Braverman Vladimir
Lang Harry
Ullah Enayat
Zhou Samson
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Object Localization by Generative Graph Configuration

Author: Gong Shaogang
Qiu Huaijun
Publication venue
Publication date: 30/12/2013
Field of study

Queen Mary Research Online

Deterministic Sampling and Range Counting in Geometric Data Streams

Author: Amitabh Chaudhary
Amitabha Bagchi
Cormode G.
Datar M.
David Eppstein
Demaine E. D.
Fang M.
Feigenbaum J.
Gupta A.
Har-Peled S.
Hershberger J.
Indyk P.
Indyk P.
Korn F.
Langerman S.
Manku G. S.
Matoušek J.
Matoušek J.
Matoušek J.
Michael T. Goodrich
Rousseeuw P. J.
Thiel H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/07/2003
Field of study

We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are inverse-polylogarithmic. We also include a lower bound for non-iceberg geometric queries.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Computing the likelihood of sequence segmentation under Markov modelling

Author: Guéguen Laurent
Publication venue
Publication date: 16/11/2009
Field of study

I tackle the problem of partitioning a sequence into homogeneous segments, where homogeneity is defined by a set of Markov models. The problem is to study the likelihood that a sequence is divided into a given number of segments. Here, the moments of this likelihood are computed through an efficient algorithm. Unlike methods involving Hidden Markov Models, this algorithm does not require probability transitions between the models. Among many possible usages of the likelihood, I present a maximum \textit{a posteriori} probability criterion to predict the number of homogeneous segments into which a sequence can be divided, and an application of this method to find CpG islands

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes