1,296 research outputs found
Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search
Neural Architecture Search (NAS) has shown great success in automating the
design of neural networks, but the prohibitive amount of computations behind
current NAS methods requires further investigations in improving the sample
efficiency and the network evaluation cost to get better results in a shorter
time. In this paper, we present a novel scalable Monte Carlo Tree Search (MCTS)
based NAS agent, named AlphaX, to tackle these two aspects. AlphaX improves the
search efficiency by adaptively balancing the exploration and exploitation at
the state level, and by a Meta-Deep Neural Network (DNN) to predict network
accuracies for biasing the search toward a promising region. To amortize the
network evaluation cost, AlphaX accelerates MCTS rollouts with a distributed
design and reduces the number of epochs in evaluating a network by transfer
learning, which is guided with the tree structure in MCTS. In 12 GPU days and
1000 samples, AlphaX found an architecture that reaches 97.84\% top-1 accuracy
on CIFAR-10, and 75.5\% top-1 accuracy on ImageNet, exceeding SOTA NAS methods
in both the accuracy and sampling efficiency. Particularly, we also evaluate
AlphaX on NASBench-101, a large scale NAS dataset; AlphaX is 3x and 2.8x more
sample efficient than Random Search and Regularized Evolution in finding the
global optimum. Finally, we show the searched architecture improves a variety
of vision applications from Neural Style Transfer, to Image Captioning and
Object Detection.Comment: To appear in the Thirty-Fourth AAAI conference on Artificial
Intelligence (AAAI-2020
Demonstration of a scaling advantage for a quantum annealer over simulated annealing
The observation of an unequivocal quantum speedup remains an elusive
objective for quantum computing. The D-Wave quantum annealing processors have
been at the forefront of experimental attempts to address this goal, given
their relatively large numbers of qubits and programmability. A complete
determination of the optimal time-to-solution (TTS) using these processors has
not been possible to date, preventing definitive conclusions about the presence
of a scaling advantage. The main technical obstacle has been the inability to
verify an optimal annealing time within the available range. Here we overcome
this obstacle and present a class of problem instances for which we observe an
optimal annealing time using a D-Wave 2000Q processor over a range spanning up
to more than qubits. This allows us to perform an optimal TTS
benchmarking analysis and perform a comparison to several classical algorithms,
including simulated annealing, spin-vector Monte Carlo, and discrete-time
simulated quantum annealing. We establish the first example of a scaling
advantage for an experimental quantum annealer over classical simulated
annealing: we find that the D-Wave device exhibits certifiably better scaling
than simulated annealing, with confidence, over the range of problem
sizes that we can test. However, we do not find evidence for a quantum speedup:
simulated quantum annealing exhibits the best scaling by a significant margin.
Our construction of instance classes with verifiably optimal annealing times
opens up the possibility of generating many new such classes, paving the way
for further definitive assessments of scaling advantages using current and
future quantum annealing devices.Comment: 26 pages, 22 figures. v2: Updated benchmarking results with
additional analysis. v3: Updated to published versio
Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale
The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian
method, used in numerical simulations of fluids in astrophysics and
computational fluid dynamics, among many other fields. SPH simulations with
detailed physics represent computationally-demanding calculations. The
parallelization of SPH codes is not trivial due to the absence of a structured
grid. Additionally, the performance of the SPH codes can be, in general,
adversely impacted by several factors, such as multiple time-stepping,
long-range interactions, and/or boundary conditions. This work presents
insights into the current performance and functionalities of three SPH codes:
SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an
interdisciplinary co-design project, SPH-EXA, for the development of an
Exascale-ready SPH mini-app. To gain such insights, a rotating square patch
test was implemented as a common test simulation for the three SPH codes and
analyzed on two modern HPC systems. Furthermore, to stress the differences with
the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an
additional test case, the Evrard collapse, has also been carried out. This work
extrapolates the common basic SPH features in the three codes for the purpose
of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app.
Moreover, the outcome of this serves as direct feedback to the parent codes, to
improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on
Cluster Computing proceedings for WRAp1
Hardware Acceleration of Electronic Design Automation Algorithms
With the advances in very large scale integration (VLSI) technology, hardware is going
parallel. Software, which was traditionally designed to execute on single core microprocessors,
now faces the tough challenge of taking advantage of this parallelism, made available
by the scaling of hardware. The work presented in this dissertation studies the acceleration
of electronic design automation (EDA) software on several hardware platforms such
as custom integrated circuits (ICs), field programmable gate arrays (FPGAs) and graphics
processors. This dissertation concentrates on a subset of EDA algorithms which are heavily
used in the VLSI design flow, and also have varying degrees of inherent parallelism
in them. In particular, Boolean satisfiability, Monte Carlo based statistical static timing
analysis, circuit simulation, fault simulation and fault table generation are explored. The
architectural and performance tradeoffs of implementing the above applications on these
alternative platforms (in comparison to their implementation on a single core microprocessor)
are studied. In addition, this dissertation also presents an automated approach to
accelerate uniprocessor code using a graphics processing unit (GPU). The key idea is to
partition the software application into kernels in an automated fashion, such that multiple
instances of these kernels, when executed in parallel on the GPU, can maximally benefit
from the GPU?s hardware resources.
The work presented in this dissertation demonstrates that several EDA algorithms can
be successfully rearchitected to maximally harness their performance on alternative platforms
such as custom designed ICs, FPGAs and graphic processors, and obtain speedups upto 800X. The approaches in this dissertation collectively aim to contribute towards enabling
the computer aided design (CAD) community to accelerate EDA algorithms on arbitrary
hardware platforms
Fast algorithm for real-time rings reconstruction
The GAP project is dedicated to study the application of GPU in several contexts in which
real-time response is important to take decisions. The definition of real-time depends on
the application under study, ranging from answer time of ÎĽs up to several hours in case
of very computing intensive task. During this conference we presented our work in low
level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and
specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6].
Apart from the study of dedicated solution to decrease the latency due to data transport
and preparation, the computing algorithms play an essential role in any GPU application.
In this contribution, we show an original algorithm developed for triggers application, to
accelerate the ring reconstruction in RICH detector when it is not possible to have seeds
for reconstruction from external trackers
- …