Search CORE

24 research outputs found

Optimal Gossip Algorithms for Exact and Approximate Quantile Computations

Author: Haeupler Bernhard
Mohapatra Jeet
Su Hsin-Hao
Publication venue
Publication date: 25/11/2017
Field of study

This paper gives drastically faster gossip algorithms to compute exact and approximate quantiles. Gossip algorithms, which allow each node to contact a uniformly random other node in each round, have been intensely studied and been adopted in many applications due to their fast convergence and their robustness to failures. Kempe et al. [FOCS'03] gave gossip algorithms to compute important aggregate statistics if every node is given a value. In particular, they gave a beautiful

O(\log n + \log \frac{1}{\epsilon})

round algorithm to

\epsilon

-approximate the sum of all values and an

O(\log^2 n)

round algorithm to compute the exact

\phi

-quantile, i.e., the the

\lceil \phi n \rceil

smallest value. We give an quadratically faster and in fact optimal gossip algorithm for the exact

\phi

-quantile problem which runs in

O(\log n)

rounds. We furthermore show that one can achieve an exponential speedup if one allows for an

\epsilon

-approximation. We give an

O(\log \log n + \log \frac{1}{\epsilon})

round gossip algorithm which computes a value of rank between

\phi n

and

(\phi+\epsilon)n

at every node.% for any

0 \leq \phi \leq 1

and

0 < \epsilon < 1

. Our algorithms are extremely simple and very robust - they can be operated with the same running times even if every transmission fails with a, potentially different, constant probability. We also give a matching

\Omega(\log \log n + \log \frac{1}{\epsilon})

lower bound which shows that our algorithm is optimal for all values of

\epsilon

arXiv.org e-Print Archive

Crossref

A PROCRUSTEAN APPROACH TO STREAM PROCESSING

Author: KATSIPOULAKIS NIKOLAOS ROMANOS
Publication venue
Publication date: 22/05/2019
Field of study

The increasing demand for real-time data processing and the constantly growing data volume have contributed to the rapid evolution of Stream Processing Engines (SPEs), which are designed to continuously process data as it arrives. Low operational cost and timely delivery of results are both objectives of paramount importance for SPEs. Given the volatile and uncharted nature of data streams, achieving the aforementioned goals under fixed resources is a challenge. This calls for adaptable SPEs, which can react to fluctuations in processing demands. In the past, three techniques have been developed for improving an SPE’s ability to adapt. Those techniques are classified based on applications’ requirements on exact or approximate results: stream partitioning, and re-partitioning target exact, and load shedding targets approximate processing. Stream partitioning strives to balance load among processors, and previous techniques neglected hidden costs of distributed execution. Load Shedding lowers the accuracy of results by dropping part of the input, and previous techniques did not cope with evolving streams. Stream re-partitioning is used to reconfigure execution while processing takes place, and previous techniques did not fully utilize window semantics. In this dissertation, we put stream processing in a procrustean bed, in terms of the manner and the degree that processing takes place. To this end, we present new approaches, for window-based aggregate operators, which are applicable to both exact and approximate stream processing in modern SPEs. Our stream partitioning, re-partitioning, and load shedding solutions offer improvements in performance and accuracy on real-world data by exploiting the semantics of both data and operations. In addition, we present SPEAr, the design of an SPE that accelerates processing by delivering approximate results with accuracy guarantees and avoiding unnecessary load. Finally, we contribute a hybrid technique, ShedPart, which can further improve load balance and performance of an SPE

D-Scholarship@Pitt

Acta Universitatis Sapientiae - Economics and Business

Author: Dávid László
Publication venue: Sapientia Hungarian University of Transylvania
Publication date: 01/01/2022
Field of study

REAL-J

Proceedings of the Fourth International Workshop on Knowledge Discovery from Sensor Data (SensorKDD’10)

Author
Publication venue: Oak Ridge National Laboratory
Publication date: 25/07/2010
Field of study

Portsmouth University Research Portal (Pure)

Essentials of Business Analytics

Author: Bhimasankaram PochirajuSridhar Seshadri
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 22/04/2020
Field of study

Open Library

Geospatial Computing: Architectures and Algorithms for Mapping Applications

Author: Milton Richard William
Publication venue: UCL (University College London)
Publication date: 28/04/2019
Field of study

Beginning with the MapTube website (1), which was launched in 2007 for crowd-sourcing maps, this project investigates approaches to exploratory Geographic Information Systems (GIS) using web-based mapping, or ‘web GIS’. Users can log in to upload their own maps and overlay different layers of GIS data sets. This work looks into the theory behind how web-based mapping systems function and whether their performance can be modelled and predicted. One of the important questions when dealing with different geospatial data sets is how they relate to one another. Internet data stores provide another source of information, which can be exploited if more generic geospatial data mining techniques are developed. The identification of similarities between thousands of maps is a GIS technique that can give structure to the overall fabric of the data, once the problems of scalability and comparisons between different geographies are solved. After running MapTube for nine years to crowd-source data, this would mark a natural progression from visualisation of individual maps to wider questions about what additional knowledge can be discovered from the data collected. In the new ‘data science’ age, the introduction of real-time data sets introduces a new challenge for web-based mapping applications. The mapping of real-time geospatial systems is technically challenging, but has the potential to show inter-dependencies as they emerge in the time series. Combined geospatial and temporal data mining of realtime sources can provide archives of transport and environmental data from which to accurately model the systems under investigation. By using techniques from machine learning, the models can be built directly from the real-time data stream. These models can then be used for analysis and experimentation, being derived directly from city data. This then leads to an analysis of the behaviours of the interacting systems. (1) The MapTube website: http://www.maptube.org

UCL Discovery