23,354 research outputs found
Probabilistic Graphical Models on Multi-Core CPUs using Java 8
In this paper, we discuss software design issues related to the development
of parallel computational intelligence algorithms on multi-core CPUs, using the
new Java 8 functional programming features. In particular, we focus on
probabilistic graphical models (PGMs) and present the parallelisation of a
collection of algorithms that deal with inference and learning of PGMs from
data. Namely, maximum likelihood estimation, importance sampling, and greedy
search for solving combinatorial optimisation problems. Through these concrete
examples, we tackle the problem of defining efficient data structures for PGMs
and parallel processing of same-size batches of data sets using Java 8
features. We also provide straightforward techniques to code parallel
algorithms that seamlessly exploit multi-core processors. The experimental
analysis, carried out using our open source AMIDST (Analysis of MassIve Data
STreams) Java toolbox, shows the merits of the proposed solutions.Comment: Pre-print version of the paper presented in the special issue on
Computational Intelligence Software at IEEE Computational Intelligence
Magazine journa
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
ASMs and Operational Algorithmic Completeness of Lambda Calculus
We show that lambda calculus is a computation model which can step by step
simulate any sequential deterministic algorithm for any computable function
over integers or words or any datatype. More formally, given an algorithm above
a family of computable functions (taken as primitive tools, i.e., kind of
oracle functions for the algorithm), for every constant K big enough, each
computation step of the algorithm can be simulated by exactly K successive
reductions in a natural extension of lambda calculus with constants for
functions in the above considered family. The proof is based on a fixed point
technique in lambda calculus and on Gurevich sequential Thesis which allows to
identify sequential deterministic algorithms with Abstract State Machines. This
extends to algorithms for partial computable functions in such a way that
finite computations ending with exceptions are associated to finite reductions
leading to terms with a particular very simple feature.Comment: 37 page
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
The hardware implementation of deep neural networks (DNNs) has recently
received tremendous attention: many applications in fact require high-speed
operations that suit a hardware implementation. However, numerous elements and
complex interconnections are usually required, leading to a large area
occupation and copious power consumption. Stochastic computing has shown
promising results for low-power area-efficient hardware implementations, even
though existing stochastic algorithms require long streams that cause long
latencies. In this paper, we propose an integer form of stochastic computation
and introduce some elementary circuits. We then propose an efficient
implementation of a DNN based on integral stochastic computing. The proposed
architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62%
average reductions in area and latency compared to the best reported
architecture in literature. We also synthesize the circuits in a 65 nm CMOS
technology and we show that the proposed integral stochastic architecture
results in up to 21% reduction in energy consumption compared to the binary
radix implementation at the same misclassification rate. Due to fault-tolerant
nature of stochastic architectures, we also consider a quasi-synchronous
implementation which yields 33% reduction in energy consumption w.r.t. the
binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure
Asymptotic Analysis of Plausible Tree Hash Modes for SHA-3
Discussions about the choice of a tree hash mode of operation for a
standardization have recently been undertaken. It appears that a single tree
mode cannot address adequately all possible uses and specifications of a
system. In this paper, we review the tree modes which have been proposed, we
discuss their problems and propose remedies. We make the reasonable assumption
that communicating systems have different specifications and that software
applications are of different types (securing stored content or live-streamed
content). Finally, we propose new modes of operation that address the resource
usage problem for the three most representative categories of devices and we
analyse their asymptotic behavior
- …