43,831 research outputs found
Towards an Efficient, Scalable Stream Query Operator Framework for Representing and Analyzing Continuous Fields
Advancements in sensor technology have made it less expensive to deploy massive numbers of sensors to observe continuous geographic phenomena at high sample rates and stream live sensor observations. This fact has raised new challenges since sensor streams have pushed the limits of traditional geo-sensor data management technology. Data Stream Engines (DSEs) provide facilities for near real-time processing of streams, however, algorithms supporting representing and analyzing Spatio-Temporal (ST) phenomena are limited.
This dissertation investigates near real-time representation and analysis of continuous ST phenomena, observed by large numbers of mobile, asynchronously sampling sensors, using a DSE and proposes two novel stream query operator frameworks. First, the ST Interpolation Stream Query Operator Framework (STI-SQO framework) continuously transforms sensor streams into rasters using a novel set of stream query operators that perform ST-IDW interpolation. A key component of the STI-SQO framework is the 3D, main memory-based, ST Grid Index that enables high performance ST insertion and deletion of massive numbers of sensor observations through Isotropic Time Cell and Time Block-based partitioning. The ST Grid Index facilitates fast ST search for samples using ST shell-based neighborhood search templates, namely the Cylindrical Shell Template and Nested Shell Template. Furthermore, the framework contains the stream-based ST-IDW algorithms ST Shell and ST ak-Shell for high performance, parallel grid cell interpolation. Secondly, the proposed ST Predicate Stream Query Operator Framework (STP-SQO framework) efficiently evaluates value predicates over ST streams of ST continuous phenomena. The framework contains several stream-based predicate evaluation algorithms, including Region-Growing, Tile-based, and Phenomenon-Aware algorithms, that target predicate evaluation to regions with seed points and minimize the number of raster cells that are interpolated when evaluating value predicates.
The performance of the proposed frameworks was assessed with regard to prediction accuracy of output results and runtime. The STI-SQO framework achieved a processing throughput of 250,000 observations in 2.5 s with a Normalized Root Mean Square Error under 0.19 using a 500×500 grid. The STP-SQO framework processed over 250,000 observations in under 0.25 s for predicate results covering less than 40% of the observation area, and the Scan Line Region Growing algorithm was consistently the fastest algorithm tested
Driving Markov chain Monte Carlo with a dependent random stream
Markov chain Monte Carlo is a widely-used technique for generating a
dependent sequence of samples from complex distributions. Conventionally, these
methods require a source of independent random variates. Most implementations
use pseudo-random numbers instead because generating true independent variates
with a physical system is not straightforward. In this paper we show how to
modify some commonly used Markov chains to use a dependent stream of random
numbers in place of independent uniform variates. The resulting Markov chains
have the correct invariant distribution without requiring detailed knowledge of
the stream's dependencies or even its marginal distribution. As a side-effect,
sometimes far fewer random numbers are required to obtain accurate results.Comment: 16 pages, 4 figure
Probabilistic Graphical Models on Multi-Core CPUs using Java 8
In this paper, we discuss software design issues related to the development
of parallel computational intelligence algorithms on multi-core CPUs, using the
new Java 8 functional programming features. In particular, we focus on
probabilistic graphical models (PGMs) and present the parallelisation of a
collection of algorithms that deal with inference and learning of PGMs from
data. Namely, maximum likelihood estimation, importance sampling, and greedy
search for solving combinatorial optimisation problems. Through these concrete
examples, we tackle the problem of defining efficient data structures for PGMs
and parallel processing of same-size batches of data sets using Java 8
features. We also provide straightforward techniques to code parallel
algorithms that seamlessly exploit multi-core processors. The experimental
analysis, carried out using our open source AMIDST (Analysis of MassIve Data
STreams) Java toolbox, shows the merits of the proposed solutions.Comment: Pre-print version of the paper presented in the special issue on
Computational Intelligence Software at IEEE Computational Intelligence
Magazine journa
Sampling and Recovery of Pulse Streams
Compressive Sensing (CS) is a new technique for the efficient acquisition of
signals, images, and other data that have a sparse representation in some
basis, frame, or dictionary. By sparse we mean that the N-dimensional basis
representation has just K<<N significant coefficients; in this case, the CS
theory maintains that just M = K log N random linear signal measurements will
both preserve all of the signal information and enable robust signal
reconstruction in polynomial time. In this paper, we extend the CS theory to
pulse stream data, which correspond to S-sparse signals/images that are
convolved with an unknown F-sparse pulse shape. Ignoring their convolutional
structure, a pulse stream signal is K=SF sparse. Such signals figure
prominently in a number of applications, from neuroscience to astronomy. Our
specific contributions are threefold. First, we propose a pulse stream signal
model and show that it is equivalent to an infinite union of subspaces. Second,
we derive a lower bound on the number of measurements M required to preserve
the essential information present in pulse streams. The bound is linear in the
total number of degrees of freedom S + F, which is significantly smaller than
the naive bound based on the total signal sparsity K=SF. Third, we develop an
efficient signal recovery algorithm that infers both the shape of the impulse
response as well as the locations and amplitudes of the pulses. The algorithm
alternatively estimates the pulse locations and the pulse shape in a manner
reminiscent of classical deconvolution algorithms. Numerical experiments on
synthetic and real data demonstrate the advantages of our approach over
standard CS
- …