4,461 research outputs found
Multilevel Monte Carlo methods for highly heterogeneous media
We discuss the application of multilevel Monte Carlo methods to elliptic
partial differential equations with random coefficients. Such problems arise,
for example, in uncertainty quantification in subsurface flow modeling. We give
a brief review of recent advances in the numerical analysis of the multilevel
algorithm under minimal assumptions on the random coefficient, and extend the
analysis to cover also tensor--valued coefficients, as well as point
evaluations. Our analysis includes as an example log--normal random
coefficients, which are frequently used in applications.Comment: 14 pages, no figure
Rethinking Arithmetic for Deep Neural Networks
We consider efficiency in the implementation of deep neural networks.
Hardware accelerators are gaining interest as machine learning becomes one of
the drivers of high-performance computing. In these accelerators, the directed
graph describing a neural network can be implemented as a directed graph
describing a Boolean circuit. We make this observation precise, leading
naturally to an understanding of practical neural networks as discrete
functions, and show that so-called binarised neural networks are functionally
complete. In general, our results suggest that it is valuable to consider
Boolean circuits as neural networks, leading to the question of which circuit
topologies are promising. We argue that continuity is central to generalisation
in learning, explore the interaction between data coding, network topology, and
node functionality for continuity, and pose some open questions for future
research. As a first step to bridging the gap between continuous and Boolean
views of neural network accelerators, we present some recent results from our
work on LUTNet, a novel Field-Programmable Gate Array inference approach.
Finally, we conclude with additional possible fruitful avenues for research
bridging the continuous and discrete views of neural networks
Thermodynamic-RAM Technology Stack
We introduce a technology stack or specification describing the multiple
levels of abstraction and specialization needed to implement a neuromorphic
processor (NPU) based on the previously-described concept of AHaH Computing and
integrate it into today's digital computing systems. The general purpose NPU
implementation described here is called Thermodynamic-RAM (kT-RAM) and is just
one of many possible architectures, each with varying advantages and trade
offs. Bringing us closer to brain-like neural computation, kT-RAM will provide
a general-purpose adaptive hardware resource to existing computing platforms
enabling fast and low-power machine learning capabilities that are currently
hampered by the separation of memory and processing, a.k.a the von Neumann
bottleneck. Because understanding such a processor based on non-traditional
principles can be difficult, by presenting the various levels of the stack from
the bottom up, layer by layer, explaining kT-RAM becomes a much easier task.
The levels of the Thermodynamic-RAM technology stack include the memristor,
synapse, AHaH node, kT-RAM, instruction set, sparse spike encoding, kT-RAM
emulator, and SENSE server
Data Protection: Combining Fragmentation, Encryption, and Dispersion, a final report
Hardening data protection using multiple methods rather than 'just'
encryption is of paramount importance when considering continuous and powerful
attacks in order to observe, steal, alter, or even destroy private and
confidential information.Our purpose is to look at cost effective data
protection by way of combining fragmentation, encryption, and dispersion over
several physical machines. This involves deriving general schemes to protect
data everywhere throughout a network of machines where they are being
processed, transmitted, and stored during their entire life cycle. This is
being enabled by a number of parallel and distributed architectures using
various set of cores or machines ranging from General Purpose GPUs to multiple
clouds. In this report, we first present a general and conceptual description
of what should be a fragmentation, encryption, and dispersion system (FEDS)
including a number of high level requirements such systems ought to meet. Then,
we focus on two kind of fragmentation. First, a selective separation of
information in two fragments a public one and a private one. We describe a
family of processes and address not only the question of performance but also
the questions of memory occupation, integrity or quality of the restitution of
the information, and of course we conclude with an analysis of the level of
security provided by our algorithms. Then, we analyze works first on general
dispersion systems in a bit wise manner without data structure consideration;
second on fragmentation of information considering data defined along an object
oriented data structure or along a record structure to be stored in a
relational database
High-performance computing selection of models of DNA substitution for multicore clusters
[Abstract] This paper presents the high-performance computing (HPC) support of jModelTest2, the most popular bioinformatic tool for the statistical selection of models of DNA substitution. As this can demand vast computational resources, especially in terms of processing power, jModelTest2 implements three parallel algorithms for model selection: (1) a multithreaded implementation for shared memory architectures; (2) a message-passing implementation for distributed memory architectures, such as clusters; and (3) a hybrid shared/distributed memory implementation for clusters of multicore nodes, combining the workload distribution across cluster nodes with a multithreaded model optimization within each node. The main limitation of the shared and distributed versions is the workload imbalance that generally appears when using more than 32 cores, a direct consequence of the heterogeneity in the computational cost of the evaluated models. The hybrid shared/distributed memory version overcomes this issue reducing the workload imbalance through a thread-based decomposition of the most costly model optimization tasks. The performance evaluation of this HPC application on a 40-core shared memory system and on a 528-core cluster has shown high scalability, with speedups of the multithreaded version of up to 32, and up to 257 for the hybrid shared/distributed memory implementation. This can represent a reduction in the execution time of some analyses from 4 days down to barely 20 minutes. The implementation of the three parallel execution strategies of jModelTest2 presented in this paper are available under a GPL license at http://code.google.com/jmodeltest2.European Research Council; ERC-2007-Stg 203161-PHYGENOM to D.P.Ministerio de Ciencia y Educación; BFU2009-08611 to D.P.Ministerio de Ciencia y Educación; TIN2010-16735 to R.D
Beyond Powers of Two: Hexagonal Modulation and Non-Binary Coding for Wireless Communication Systems
Adaptive modulation and coding (AMC) is widely employed in modern wireless
communication systems to improve the transmission efficiency by adjusting the
transmission rate according to the channel conditions. Thus, AMC can provide
very efficient use of channel resources especially over fading channels.
Quadrature Amplitude Modulation (QAM) is an ef- ficient and widely employed
digital modulation technique. It typically employs a rectangular signal
constellation. Therefore the decision regions of the constellation are square
partitions of the two-dimensional signal space. However, it is well known that
hexagons rather than squares provide the most compact regular tiling in two
dimensions. A compact tiling means a dense packing of the constellation points
and thus more energy efficient data transmission. Hexagonal modulation can be
difficult to implement because it does not fit well with the usual power-
of-two symbol sizes employed with binary data. To overcome this problem,
non-binary coding is combined with hexagonal modulation in this paper to
provide a system which is compatible with binary data. The feasibility and
efficiency are evaluated using a software-defined radio (SDR) based prototype.
Extensive simulation results are presented which show that this approach can
provide improved energy efficiency and spectrum utilization in wireless
communication systems.Comment: 9 page
Secure Payment System Utilizing MANET for Disaster Areas
Mobile payment system in a disaster area have the potential to provide
electronic transactions for people purchasing recovery goods like foodstuffs,
clothes, and medicine. Conversely, to enable transactions in a disaster area,
current payment systems need communication infrastructures (such as wired
networks and cellular networks) which may be ruined during such disasters as
large-scale earthquakes and flooding and thus cannot be depended on in a
disaster area. In this paper, we introduce a new mobile payment system
utilizing infrastructureless MANETs to enable transactions that permit users to
shop in disaster areas. Specifically, we introduce an endorsement-based
mechanism to provide payment guarantees for a customer-to-merchant transaction
and a multilevel endorsement mechanism with a lightweight scheme based on Bloom
filter and Merkle tree to reduce communication overheads. Our mobile payment
system achieves secure transaction by adopting various schemes such as
location-based mutual monitoring scheme and blind signature, while our newly
introduce event chain mechanism prevents double spending attacks. As validated
by simulations, the proposed mobile payment system is useful in a disaster
area, achieving high transaction completion ratio, 65% - 90% for all scenario
tested, and is storage-efficient for mobile devices with an overall average of
7MB merchant message size
High Performance Evaluation of Helmholtz Potentials using the Multi-Level Fast Multipole Algorithm
Evaluation of pair potentials is critical in a number of areas of physics.
The classicalN-body problem has its root in evaluating the Laplace potential,
and has spawned tree-algorithms, the fast multipole method (FMM), as well as
kernel independent approaches. Over the years, FMM for Laplace potential has
had a profound impact on a number of disciplines as it has been possible to
develop highly scalable parallel algorithm for these potential evaluators. This
is in stark contrast to parallel algorithms for the Helmholtz (oscillatory)
potentials. The principal bottleneck to scalable parallelism are operations
necessary to traverse up, across and down the tree, affecting both computation
and communication. In this paper, we describe techniques to overcome
bottlenecks and achieve high performance evaluation of the Helmholtz potential
for a wide spectrum of geometries. We demonstrate that the resulting
implementation has a load balancing effect that significantly reduces the
time-to-solution and enhances the scale of problems that can be treated using
full wave physics.Comment: Submitted to ACM Transactions on Parallel Computin
- …