7,422 research outputs found
Parallel Streaming Frequency-Based Aggregates
We present efficient parallel streaming algorithms for fundamental frequency-based aggregates in both the sliding window and the infinite window settings. In the sliding window setting, we give a parallel algorithm for maintaining a space-bounded block counter (SBBC). Using SBBC, we derive algorithms for basic counting, frequency estimation, and heavy hitters that perform no more work than their best sequential counterparts. In the infinite window setting, we present algorithms for frequency estimation, heavy hitters, and count-min sketch. For both the infinite window and sliding window settings, our parallel algorithms process a minibatch of items using linear work and polylog parallel depth. We also prove a lower bound showing that the work of the parallel algorithm is optimal in the case of heavy hitters and frequency estimation. To our knowledge, these are the first parallel algorithms for these problems that are provably work efficient and have low depth
Phase transitions during fruiting body formation in Myxococcus xanthus
The formation of a collectively moving group benefits individuals within a
population in a variety of ways such as ultra-sensitivity to perturbation,
collective modes of feeding, and protection from environmental stress. While
some collective groups use a single organizing principle, others can
dynamically shift the behavior of the group by modifying the interaction rules
at the individual level. The surface-dwelling bacterium Myxococcus xanthus
forms dynamic collective groups both to feed on prey and to aggregate during
times of starvation. The latter behavior, termed fruiting-body formation,
involves a complex, coordinated series of density changes that ultimately lead
to three-dimensional aggregates comprising hundreds of thousands of cells and
spores. This multi-step developmental process most likely involves several
different single-celled behaviors as the population condenses from a loose,
two-dimensional sheet to a three-dimensional mound. Here, we use
high-resolution microscopy and computer vision software to spatiotemporally
track the motion of thousands of individuals during the initial stages of
fruiting body formation. We find that a combination of cell-contact-mediated
alignment and internal timing mechanisms drive a phase transition from
exploratory flocking, in which cell groups move rapidly and coherently over
long distances, to a reversal-mediated localization into streams, which act as
slow-spreading, quasi-one-dimensional nematic fluids. These observations lead
us to an active liquid crystal description of the myxobacterial development
cycle.Comment: 16 pages, 5 figure
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
Lattice gas cellular automata model for rippling and aggregation in myxobacteria
A lattice-gas cellular automaton (LGCA) model is used to simulate rippling
and aggregation in myxobacteria. An efficient way of representing cells of
different cell size, shape and orientation is presented that may be easily
extended to model later stages of fruiting body formation. This LGCA model is
designed to investigate whether a refractory period, a minimum response time, a
maximum oscillation period and non-linear dependence of reversals of cells on
C-factor are necessary assumptions for rippling. It is shown that a refractory
period of 2-3 minutes, a minimum response time of up to 1 minute and no maximum
oscillation period best reproduce rippling in the experiments of {\it
Myxoccoccus xanthus}. Non-linear dependence of reversals on C-factor is
critical at high cell density. Quantitative simulations demonstrate that the
increase in wavelength of ripples when a culture is diluted with non-signaling
cells can be explained entirely by the decreased density of C-signaling cells.
This result further supports the hypothesis that levels of C-signaling
quantitatively depend on and modulate cell density. Analysis of the
interpenetrating high density waves shows the presence of a phase shift
analogous to the phase shift of interpenetrating solitons. Finally, a model for
swarming, aggregation and early fruiting body formation is presented
Fully decentralized computation of aggregates over data streams
In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We experimentally evaluate for the efficiency and accuracy of our algorithms on realistic simulated scenarios
Fast and Accurate Mining of Correlated Heavy Hitters
The problem of mining Correlated Heavy Hitters (CHH) from a two-dimensional
data stream has been introduced recently, and a deterministic algorithm based
on the use of the Misra--Gries algorithm has been proposed by Lahiri et al. to
solve it. In this paper we present a new counter-based algorithm for tracking
CHHs, formally prove its error bounds and correctness and show, through
extensive experimental results, that our algorithm outperforms the Misra--Gries
based algorithm with regard to accuracy and speed whilst requiring
asymptotically much less space
Data Provenance and Management in Radio Astronomy: A Stream Computing Approach
New approaches for data provenance and data management (DPDM) are required
for mega science projects like the Square Kilometer Array, characterized by
extremely large data volume and intense data rates, therefore demanding
innovative and highly efficient computational paradigms. In this context, we
explore a stream-computing approach with the emphasis on the use of
accelerators. In particular, we make use of a new generation of high
performance stream-based parallelization middleware known as InfoSphere
Streams. Its viability for managing and ensuring interoperability and integrity
of signal processing data pipelines is demonstrated in radio astronomy. IBM
InfoSphere Streams embraces the stream-computing paradigm. It is a shift from
conventional data mining techniques (involving analysis of existing data from
databases) towards real-time analytic processing. We discuss using InfoSphere
Streams for effective DPDM in radio astronomy and propose a way in which
InfoSphere Streams can be utilized for large antennae arrays. We present a
case-study: the InfoSphere Streams implementation of an autocorrelating
spectrometer, and using this example we discuss the advantages of the
stream-computing approach and the utilization of hardware accelerators
- …