7,422 research outputs found

    Parallel Streaming Frequency-Based Aggregates

    Get PDF
    We present efficient parallel streaming algorithms for fundamental frequency-based aggregates in both the sliding window and the infinite window settings. In the sliding window setting, we give a parallel algorithm for maintaining a space-bounded block counter (SBBC). Using SBBC, we derive algorithms for basic counting, frequency estimation, and heavy hitters that perform no more work than their best sequential counterparts. In the infinite window setting, we present algorithms for frequency estimation, heavy hitters, and count-min sketch. For both the infinite window and sliding window settings, our parallel algorithms process a minibatch of items using linear work and polylog parallel depth. We also prove a lower bound showing that the work of the parallel algorithm is optimal in the case of heavy hitters and frequency estimation. To our knowledge, these are the first parallel algorithms for these problems that are provably work efficient and have low depth

    Phase transitions during fruiting body formation in Myxococcus xanthus

    Full text link
    The formation of a collectively moving group benefits individuals within a population in a variety of ways such as ultra-sensitivity to perturbation, collective modes of feeding, and protection from environmental stress. While some collective groups use a single organizing principle, others can dynamically shift the behavior of the group by modifying the interaction rules at the individual level. The surface-dwelling bacterium Myxococcus xanthus forms dynamic collective groups both to feed on prey and to aggregate during times of starvation. The latter behavior, termed fruiting-body formation, involves a complex, coordinated series of density changes that ultimately lead to three-dimensional aggregates comprising hundreds of thousands of cells and spores. This multi-step developmental process most likely involves several different single-celled behaviors as the population condenses from a loose, two-dimensional sheet to a three-dimensional mound. Here, we use high-resolution microscopy and computer vision software to spatiotemporally track the motion of thousands of individuals during the initial stages of fruiting body formation. We find that a combination of cell-contact-mediated alignment and internal timing mechanisms drive a phase transition from exploratory flocking, in which cell groups move rapidly and coherently over long distances, to a reversal-mediated localization into streams, which act as slow-spreading, quasi-one-dimensional nematic fluids. These observations lead us to an active liquid crystal description of the myxobacterial development cycle.Comment: 16 pages, 5 figure

    Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

    Full text link
    Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data

    Lattice gas cellular automata model for rippling and aggregation in myxobacteria

    Full text link
    A lattice-gas cellular automaton (LGCA) model is used to simulate rippling and aggregation in myxobacteria. An efficient way of representing cells of different cell size, shape and orientation is presented that may be easily extended to model later stages of fruiting body formation. This LGCA model is designed to investigate whether a refractory period, a minimum response time, a maximum oscillation period and non-linear dependence of reversals of cells on C-factor are necessary assumptions for rippling. It is shown that a refractory period of 2-3 minutes, a minimum response time of up to 1 minute and no maximum oscillation period best reproduce rippling in the experiments of {\it Myxoccoccus xanthus}. Non-linear dependence of reversals on C-factor is critical at high cell density. Quantitative simulations demonstrate that the increase in wavelength of ripples when a culture is diluted with non-signaling cells can be explained entirely by the decreased density of C-signaling cells. This result further supports the hypothesis that levels of C-signaling quantitatively depend on and modulate cell density. Analysis of the interpenetrating high density waves shows the presence of a phase shift analogous to the phase shift of interpenetrating solitons. Finally, a model for swarming, aggregation and early fruiting body formation is presented

    Fully decentralized computation of aggregates over data streams

    Get PDF
    In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We experimentally evaluate for the efficiency and accuracy of our algorithms on realistic simulated scenarios

    Fast and Accurate Mining of Correlated Heavy Hitters

    Full text link
    The problem of mining Correlated Heavy Hitters (CHH) from a two-dimensional data stream has been introduced recently, and a deterministic algorithm based on the use of the Misra--Gries algorithm has been proposed by Lahiri et al. to solve it. In this paper we present a new counter-based algorithm for tracking CHHs, formally prove its error bounds and correctness and show, through extensive experimental results, that our algorithm outperforms the Misra--Gries based algorithm with regard to accuracy and speed whilst requiring asymptotically much less space

    Data Provenance and Management in Radio Astronomy: A Stream Computing Approach

    Get PDF
    New approaches for data provenance and data management (DPDM) are required for mega science projects like the Square Kilometer Array, characterized by extremely large data volume and intense data rates, therefore demanding innovative and highly efficient computational paradigms. In this context, we explore a stream-computing approach with the emphasis on the use of accelerators. In particular, we make use of a new generation of high performance stream-based parallelization middleware known as InfoSphere Streams. Its viability for managing and ensuring interoperability and integrity of signal processing data pipelines is demonstrated in radio astronomy. IBM InfoSphere Streams embraces the stream-computing paradigm. It is a shift from conventional data mining techniques (involving analysis of existing data from databases) towards real-time analytic processing. We discuss using InfoSphere Streams for effective DPDM in radio astronomy and propose a way in which InfoSphere Streams can be utilized for large antennae arrays. We present a case-study: the InfoSphere Streams implementation of an autocorrelating spectrometer, and using this example we discuss the advantages of the stream-computing approach and the utilization of hardware accelerators
    • …
    corecore