Search CORE

238 research outputs found

GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

Author: Chang Ching
Li Feifei
Bestavros Azer
Kollios
Publication venue: Boston University Computer Science Department
Publication date: 01/01/1997
Field of study

We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

Boston University Institutional Repository (OpenBU)

Emergence of good conduct, scaling and Zipf laws in human behavioral sequences in an online world

Author: Sinatra Roberta
Szell Michel
Thurner Stefan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

We study behavioral action sequences of players in a massive multiplayer online game. In their virtual life players use eight basic actions which allow them to interact with each other. These actions are communication, trade, establishing or breaking friendships and enmities, attack, and punishment. We measure the probabilities for these actions conditional on previous taken and received actions and find a dramatic increase of negative behavior immediately after receiving negative actions. Similarly, positive behavior is intensified by receiving positive actions. We observe a tendency towards anti-persistence in communication sequences. Classifying actions as positive (good) and negative (bad) allows us to define binary 'world lines' of lives of individuals. Positive and negative actions are persistent and occur in clusters, indicated by large scaling exponents alpha~0.87 of the mean square displacement of the world lines. For all eight action types we find strong signs for high levels of repetitiveness, especially for negative actions. We partition behavioral sequences into segments of length n (behavioral `words' and 'motifs') and study their statistical properties. We find two approximate power laws in the word ranking distribution, one with an exponent of kappa-1 for the ranks up to 100, and another with a lower exponent for higher ranks. The Shannon n-tuple redundancy yields large values and increases in terms of word length, further underscoring the non-trivial statistical properties of behavioral sequences. On the collective, societal level the timeseries of particular actions per day can be understood by a simple mean-reverting log-normal model.Comment: 6 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

International Institute for Applied Systems Analysis (IIASA)

Generalized (m,k)-Zipf law for fractional Brownian motion-like time series with or without effect of an additional linear trend

Author: Ellinger A. G.
Gibbons J. D.
Gutenberg B.
Hurst H. E.
Kantz H.
Pareto V.
Peters E. E.
Peters E. E.
Zipf G. K.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 13/09/2002
Field of study

We have translated fractional Brownian motion (FBM) signals into a text based on two ''letters'', as if the signal fluctuations correspond to a constant stepsize random walk. We have applied the Zipf method to extract the

\zeta '

exponent relating the word frequency and its rank on a log-log plot. We have studied the variation of the Zipf exponent(s) giving the relationship between the frequency of occurrence of words of length

m<8

made of such two letters:

\zeta '

is varying as a power law in terms of

m

. We have also searched how the

\zeta '

exponent of the Zipf law is influenced by a linear trend and the resulting effect of its slope. We can distinguish finite size effects, and results depending whether the starting FBM is persistent or not, i.e. depending on the FBM Hurst exponent

H

. It seems then numerically proven that the Zipf exponent of a persistent signal is more influenced by the trend than that of an antipersistent signal. It appears that the conjectured law

\zeta ' = |2H-1|

only holds near

H=0.5

. We have also introduced considerations based on the notion of a {\it time dependent Zipf law} along the signal.Comment: 24 pages, 12 figures; to appear in Int. J. Modern Phys

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège

Maximum likelihood estimation for constrained parameters of multinomial distributions - Application to Zipf-Mandelbrot models

Author: Izsák F.
Publication venue: Elsevier
Publication date: 01/01/2006
Field of study

A numerical maximum likelihood (ML) estimation procedure is developed for the constrained parameters of multinomial distributions. The main difﬁculty involved in computing the likelihood function is the precise and fast determination of the multinomial coefﬁcients. For this the coefﬁcients are rewritten into a telescopic product. The presented method is applied to the ML estimation of the Zipf–Mandelbrot (ZM) distribution, which provides a true model in many real-life cases. The examples discussed arise from ecological and medical observations. Based on the estimates, the hypothesis that the data is ZM distributed is tested using a chi-square test. The computer code of the presented procedure is available on request by the author

University of Twente Research Information

Network traffic data analysis

Author: Kapri Harish
Publication venue: LSU Digital Commons
Publication date: 01/01/2011
Field of study

The desire to conceptualize network traffic in a prevailing communication network is a facet for many types of network research studies. In this research, real traffic traces collected over trans-Pacific backbone links (the MAWI repository, providing publicly available anonymized traces) are analyzed to study the underlying traffic patterns. All data analysis and visualization is carried out using Matlab (Matlab is a trademark of The Mathworks, Inc.). At packet level, we first measure parameters such as distribution of packet lengths, distribution of protocol types, and then fit following analytical models. Next, the concept of flow is introduced and flow based analysis is studied. We consider flow related parameters such as top ports seen, duration of the flow, distribution of flow lengths, and number of flows with different timeout values and provide analytical models to fit the flow lengths. Further, we study the amount of data flowing between source-destination pairs. Finally, we focus on TCP-specific aspects of captured traces such as retransmissions and packet round-trip times. From the results obtained, we infer the Zipf-type nature of distribution for number of flows, heavy-tailness of flow sizes and the contribution of well-known ports at packet and flow level. Our study helps a network analyst to farther the knowledge and helps optimize the network resources, while performing efficient traffic engineering

Louisiana State University

The efficiency of individual optimization in the conditions of competitive growth

Author: Aoyama
Ayati
B. Brutovský
Bak
Bechhoefer
Beveridge
Bonabeau
Bouchaud
Caplat
Chang
Chatterjee
Coelho
Cook
D. Horváth
Dasci
Decker
Fujiwara
Horváth
J. Kočišová
Judd
Langton
Mansury
Murray
Newman
Olami
Painter
Pareto
Scarfone
Stanley
Sunitiyoso
Tabuchi
Webb
Wilke
Wit
Xiang
Yaari
Publication venue: 'Elsevier BV'
Publication date: 06/03/2009
Field of study

The paper aims to discuss statistical properties of the multi-agent based model of competitive growth. Each of the agents is described by growth (or decay) rule of its virtual "mass" with the rate affected by the interaction with other agents. The interaction depends on the strategy vector and mutual distance between agents and both are subjected to the agent's individual optimization process. Steady-state simulations yield phase diagrams with the high and low competition phases (HCP and LCP, respectively) separated by critical point. Particular focus has been made on the indicators of the power-law behavior of the mass distributions with respect to the critical regime. In this regime the study has revealed remarkable anomaly in the optimization efficiency

arXiv.org e-Print Archive

Crossref

Two halves of a meaningful text are statistically different

Author: Allahverdyan Armen E.
Deng S.
Deng Weibing
Xie R.
Publication venue
Publication date: 09/04/2020
Field of study

Which statistical features distinguish a meaningful text (possibly written in an unknown system) from a meaningless set of symbols? Here we answer this question by comparing features of the first half of a text to its second half. This comparison can uncover hidden effects, because the halves have the same values of many parameters (style, genre {\it etc}). We found that the first half has more different words and more rare words than the second half. Also, words in the first half are distributed less homogeneously over the text in the sense of of the difference between the frequency and the inverse spatial period. These differences hold for the significant majority of several hundred relatively short texts we studied. The statistical significance is confirmed via the Wilcoxon test. Differences disappear after random permutation of words that destroys the linear structure of the text. The differences reveal a temporal asymmetry in meaningful texts, which is confirmed by showing that texts are much better compressible in their natural way (i.e. along the narrative) than in the word-inverted form. We conjecture that these results connect the semantic organization of a text (defined by the flow of its narrative) to its statistical features.Comment: 15 pages and 14 table

arXiv.org e-Print Archive

Maximum likelihood estimation for constrained parameters of multinomial distributions—Application to Zipf–Mandelbrot models

Author: Berkson
Chao
Colwell
Egghe
F. Izsák
Frontier
Huberman
Jamshidian
Lehmann
Li
Marsili
McLachlan
Meadow
Palmer
Palmer
Papp
Papp
Piqueira
Rousseau
Sabatier
Wilson
Zipf
Zornig
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref