Search CORE

36,552 research outputs found

GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

Author: Chang Ching
Li Feifei
Bestavros Azer
Kollios
Publication venue: Boston University Computer Science Department
Publication date: 01/01/1997
Field of study

We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

Boston University Institutional Repository (OpenBU)

Structure in the 3D Galaxy Distribution: I. Methods and Example Results

Author: Abazajian
Abazajian
Adelman-McCarthy
Andersen
Barrow
Blanton
Blanton
Blanton
Bok
Cappellari
Choi
Connolly
Cowan
Croft
Daley
Daley
de Berg
de Vaucouleurs
de Vaucouleurs
DeSieno
Doroshkevich
Efstathiou
Einasto
Gazis
Gomez
Gott
Gray
Gray
Hogg
Holmberg
Hubble
Hubble
Icke
Ikeuchi
Ivezić
Ivezić
Jackson
Jeffrey D. Scargle
Kim
Kohonen
Krzewina
Kutoyants
M. J. Way
Martinez
Melnyk
Merényi
Messier
Moore
Neyman
Neyman
Okabe
P. R. Gazis
Papoulis
Paredes
Pearson
Peebles
Preparata
Ramella
Reiz
Ritter
Saslaw
Scargle
Scargle
Schaap
Schaap
Schlegel
Shandarin
Shane
Silverman
Slezak
Snyder
Soares-Santos
Sousbie
Sousbie
Stein
Stoyan
Strauss
Szapudi
Totsuji
Ueda
van de Weygaert
van de Weygaert
van de Weygaert
Wright
York
Zehavi
Zehavi
Zel'dovich
Zhang
Zwicky
Publication venue: 'IOP Publishing'
Publication date: 02/12/2010
Field of study

Three methods for detecting and characterizing structure in point data, such as that generated by redshift surveys, are described: classification using self-organizing maps, segmentation using Bayesian blocks, and density estimation using adaptive kernels. The first two methods are new, and allow detection and characterization of structures of arbitrary shape and at a wide range of spatial scales. These methods should elucidate not only clusters, but also the more distributed, wide-ranging filaments and sheets, and further allow the possibility of detecting and characterizing an even broader class of shapes. The methods are demonstrated and compared in application to three data sets: a carefully selected volume-limited sample from the Sloan Digital Sky Survey redshift data, a similarly selected sample from the Millennium Simulation, and a set of points independently drawn from a uniform probability distribution -- a so-called Poisson distribution. We demonstrate a few of the many ways in which these methods elucidate large scale structure in the distribution of galaxies in the nearby Universe.Comment: Re-posted after referee corrections along with partially re-written introduction. 80 pages, 31 figures, ApJ in Press. For full sized figures please download from: http://astrophysics.arc.nasa.gov/~mway/lss1.pd

arXiv.org e-Print Archive

Crossref

Estimating and Sampling Graphs with Multidimensional Random Walks

Author: Ribeiro Bruno
Towsley Don
Publication venue
Publication date: 01/01/2010
Field of study

Estimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling may require too many resources (time, bandwidth, or money). Random walks, which normally require fewer resources per sample, can suffer from large estimation errors in the presence of disconnected or loosely connected graphs. In this work we propose a new

m

-dimensional random walk that uses

m

dependent random walkers. We show that the proposed sampling method, which we call Frontier sampling, exhibits all of the nice sampling properties of a regular random walk. At the same time, our simulations over large real world graphs show that, in the presence of disconnected or loosely connected components, Frontier sampling exhibits lower estimation errors than regular random walks. We also show that Frontier sampling is more suitable than random vertex sampling to sample the tail of the degree distribution of the graph

arXiv.org e-Print Archive

CiteSeerX

Unbiased sampling of network ensembles

Author: Garlaschelli Diego
Mastrandrea Rossana
Squartini Tiziano
Publication venue: 'IOP Publishing'
Publication date: 01/01/2015
Field of study

Sampling random graphs with given properties is a key step in the analysis of networks, as random ensembles represent basic null models required to identify patterns such as communities and motifs. An important requirement is that the sampling process is unbiased and efficient. The main approaches are microcanonical, i.e. they sample graphs that match the enforced constraints exactly. Unfortunately, when applied to strongly heterogeneous networks (like most real-world examples), the majority of these approaches become biased and/or time-consuming. Moreover, the algorithms defined in the simplest cases, such as binary graphs with given degrees, are not easily generalizable to more complicated ensembles. Here we propose a solution to the problem via the introduction of a "Maximize and Sample" ("Max & Sam" for short) method to correctly sample ensembles of networks where the constraints are `soft', i.e. realized as ensemble averages. Our method is based on exact maximum-entropy distributions and is therefore unbiased by construction, even for strongly heterogeneous networks. It is also more computationally efficient than most microcanonical alternatives. Finally, it works for both binary and weighted networks with a variety of constraints, including combined degree-strength sequences and full reciprocity structure, for which no alternative method exists. Our canonical approach can in principle be turned into an unbiased microcanonical one, via a restriction to the relevant subset. Importantly, the analysis of the fluctuations of the constraints suggests that the microcanonical and canonical versions of all the ensembles considered here are not equivalent. We show various real-world applications and provide a code implementing all our algorithms.Comment: MatLab code available at http://www.mathworks.it/matlabcentral/fileexchange/46912-max-sam-package-zi

arXiv.org e-Print Archive

Crossref

HAL AMU

Archivio della ricerca della Scuola IMT Alti Studi Lucca

Leiden University Scholary Publications

IMT Institutional Repository