Search CORE

103,404 research outputs found

Probabilistic Graphical Models on Multi-Core CPUs using Java 8

Author: Borchani Hanen
Martinez Ana M.
Masegosa Andres R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.Comment: Pre-print version of the paper presented in the special issue on Computational Intelligence Software at IEEE Computational Intelligence Magazine journa

arXiv.org e-Print Archive

Crossref

VBN

Data Provenance and Management in Radio Astronomy: A Stream Computing Approach

Author: Biem Alain
Elmegreen Bruce
Ensor Andrew
Gulyaev Sergei
Mahmoud Mahmoud S.
Publication venue
Publication date: 12/12/2011
Field of study

New approaches for data provenance and data management (DPDM) are required for mega science projects like the Square Kilometer Array, characterized by extremely large data volume and intense data rates, therefore demanding innovative and highly efficient computational paradigms. In this context, we explore a stream-computing approach with the emphasis on the use of accelerators. In particular, we make use of a new generation of high performance stream-based parallelization middleware known as InfoSphere Streams. Its viability for managing and ensuring interoperability and integrity of signal processing data pipelines is demonstrated in radio astronomy. IBM InfoSphere Streams embraces the stream-computing paradigm. It is a shift from conventional data mining techniques (involving analysis of existing data from databases) towards real-time analytic processing. We discuss using InfoSphere Streams for effective DPDM in radio astronomy and propose a way in which InfoSphere Streams can be utilized for large antennae arrays. We present a case-study: the InfoSphere Streams implementation of an autocorrelating spectrometer, and using this example we discuss the advantages of the stream-computing approach and the utilization of hardware accelerators

arXiv.org e-Print Archive

AUT Scholarly Commons

ElfStore: A Resilient Data Storage Service for Federated Edge and Fog Resources

Author: Monga Sumit Kumar
R Sheshadri K
Simmhan Yogesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/05/2019
Field of study

Edge and fog computing have grown popular as IoT deployments become wide-spread. While application composition and scheduling on such resources are being explored, there exists a gap in a distributed data storage service on the edge and fog layer, instead depending solely on the cloud for data persistence. Such a service should reliably store and manage data on fog and edge devices, even in the presence of failures, and offer transparent discovery and access to data for use by edge computing applications. Here, we present Elfstore, a first-of-its-kind edge-local federated store for streams of data blocks. It uses reliable fog devices as a super-peer overlay to monitor the edge resources, offers federated metadata indexing using Bloom filters, locates data within 2-hops, and maintains approximate global statistics about the reliability and storage capacity of edges. Edges host the actual data blocks, and we use a unique differential replication scheme to select edges on which to replicate blocks, to guarantee a minimum reliability and to balance storage utilization. Our experiments on two IoT virtual deployments with 20 and 272 devices show that ElfStore has low overheads, is bound only by the network bandwidth, has scalable performance, and offers tunable resilience.Comment: 24 pages, 14 figures, To appear in IEEE International Conference on Web Services (ICWS), Milan, Italy, 201

arXiv.org e-Print Archive

Crossref

Deterministic Sampling and Range Counting in Geometric Data Streams

Author: Amitabh Chaudhary
Amitabha Bagchi
Cormode G.
Datar M.
David Eppstein
Demaine E. D.
Fang M.
Feigenbaum J.
Gupta A.
Har-Peled S.
Hershberger J.
Indyk P.
Indyk P.
Korn F.
Langerman S.
Manku G. S.
Matoušek J.
Matoušek J.
Matoušek J.
Michael T. Goodrich
Rousseeuw P. J.
Thiel H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/07/2003
Field of study

We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are inverse-polylogarithmic. We also include a lower bound for non-iceberg geometric queries.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window

Author: Chan Ho-Leung
Lam Tak-Wah
Lee Lap-Kei
Ting Hing-Fung
Publication venue
Publication date: 01/01/2010
Field of study

The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. This paper initiates a theoretical study of algorithms for monitoring distributed data streams over a time-based sliding window (which contains a variable number of items and possibly out-of-order items). The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to be able to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is

O(\frac{k} {\epsilon} \log \frac{\epsilon N}{k})

bits for basic counting and

O(\frac{k}{\epsilon} \log \frac{N}{k})

words for the remainings, where

k

is the number of distributed data streams,

N

is the total number of items in the streams that arrive or expire in the window, and

\epsilon < 1

is the desired error bound. Matching and nearly matching lower bounds are also obtained.Comment: 12 pages, to appear in the 27th International Symposium on Theoretical Aspects of Computer Science (STACS), 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

HKU Scholars Hub