Search CORE

90,214 research outputs found

Evaluation of optimization techniques for aggregation

Author: Li Chenxi
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

Aggregations are almost always done at the top of operator tree after all selections and joins in a SQL query. But actually they can be done before joins and make later joins much cheaper when used properly. Although some enumeration algorithms considering eager aggregation are proposed, no sufficient evaluations are available to guide the adoption of this technique in practice. And no evaluations are done for real data sets and real queries with estimated cardinalities. That means it is not known how eager aggregation performs in the real world. In this thesis, a new estimation method for group by and join combining traditional estimation method and index-based join sampling is proposed and evaluated. Two enumeration algorithms considering eager aggregation are implemented and compared in the context of estimated cardinality. We find that the new estimation method works well with little overhead and that under certain conditions, eager aggregation can dramatically accelerate queries

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Moment-based parameter estimation in binomial random intersection graph models

Author: DF Gleich
E Godehardt
ED Kolaczyk
FG Ball
M Bloznelis
M Bloznelis
M Bloznelis
M Bloznelis
M Deijfen
M Karoński
MEJ Newman
PJ Bickel
R Hofstad van der
RL Graham
S Nikoletseas
S Wasserman
T Britton
Publication venue
Publication date: 24/06/2018
Field of study

Binomial random intersection graphs can be used as parsimonious statistical models of large and sparse networks, with one parameter for the average degree and another for transitivity, the tendency of neighbours of a node to be connected. This paper discusses the estimation of these parameters from a single observed instance of the graph, using moment estimators based on observed degrees and frequencies of 2-stars and triangles. The observed data set is assumed to be a subgraph induced by a set of

n_0

nodes sampled from the full set of

n

nodes. We prove the consistency of the proposed estimators by showing that the relative estimation error is small with high probability for

n_0 \gg n^{2/3} \gg 1

. As a byproduct, our analysis confirms that the empirical transitivity coefficient of the graph is with high probability close to the theoretical clustering coefficient of the model.Comment: 15 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Statistical structures for internet-scale data management

Author: Ntarmos N.
Triantafillou P.
Weikum G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

CiteSeerX

Springer - Publisher Connector

Enlighten

MPG.PuRe

Fully decentralized computation of aggregates over data streams

Author: Adi Rosen
Becchetti Luca
Bordino Ilaria
Leonardi Stefano
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We experimentally evaluate for the efficiency and accuracy of our algorithms on realistic simulated scenarios

Archivio della ricerca- Università di Roma La Sapienza