31,675 research outputs found
Massively Parallel Video Networks
We introduce a class of causal video understanding models that aims to
improve efficiency of video processing by maximising throughput, minimising
latency, and reducing the number of clock cycles. Leveraging operation
pipelining and multi-rate clocks, these models perform a minimal amount of
computation (e.g. as few as four convolutional layers) for each frame per
timestep to produce an output. The models are still very deep, with dozens of
such operations being performed but in a pipelined fashion that enables
depth-parallel computation. We illustrate the proposed principles by applying
them to existing image architectures and analyse their behaviour on two video
tasks: action recognition and human keypoint localisation. The results show
that a significant degree of parallelism, and implicitly speedup, can be
achieved with little loss in performance.Comment: Fixed typos in densenet model definition in appendi
A massively parallel multi-level approach to a domain decomposition method for the optical flow estimation with varying illumination
We consider a variational method to solve the optical flow problem with
varying illumination. We apply an adaptive control of the regularization
parameter which allows us to preserve the edges and fine features of the
computed flow. To reduce the complexity of the estimation for high resolution
images and the time of computations, we implement a multi-level parallel
approach based on the domain decomposition with the Schwarz overlapping method.
The second level of parallelism uses the massively parallel solver MUMPS. We
perform some numerical simulations to show the efficiency of our approach and
to validate it on classical and real-world image sequences
Adaptive control in rollforward recovery for extreme scale multigrid
With the increasing number of compute components, failures in future
exa-scale computer systems are expected to become more frequent. This motivates
the study of novel resilience techniques. Here, we extend a recently proposed
algorithm-based recovery method for multigrid iterations by introducing an
adaptive control. After a fault, the healthy part of the system continues the
iterative solution process, while the solution in the faulty domain is
re-constructed by an asynchronous on-line recovery. The computations in both
the faulty and healthy subdomains must be coordinated in a sensitive way, in
particular, both under and over-solving must be avoided. Both of these waste
computational resources and will therefore increase the overall
time-to-solution. To control the local recovery and guarantee an optimal
re-coupling, we introduce a stopping criterion based on a mathematical error
estimator. It involves hierarchical weighted sums of residuals within the
context of uniformly refined meshes and is well-suited in the context of
parallel high-performance computing. The re-coupling process is steered by
local contributions of the error estimator. We propose and compare two criteria
which differ in their weights. Failure scenarios when solving up to
unknowns on more than 245\,766 parallel processes will be
reported on a state-of-the-art peta-scale supercomputer demonstrating the
robustness of the method
nbodykit: an open-source, massively parallel toolkit for large-scale structure
We present nbodykit, an open-source, massively parallel Python toolkit for
analyzing large-scale structure (LSS) data. Using Python bindings of the
Message Passing Interface (MPI), we provide parallel implementations of many
commonly used algorithms in LSS. nbodykit is both an interactive and scalable
piece of scientific software, performing well in a supercomputing environment
while still taking advantage of the interactive tools provided by the Python
ecosystem. Existing functionality includes estimators of the power spectrum, 2
and 3-point correlation functions, a Friends-of-Friends grouping algorithm,
mock catalog creation via the halo occupation distribution technique, and
approximate N-body simulations via the FastPM scheme. The package also provides
a set of distributed data containers, insulated from the algorithms themselves,
that enable nbodykit to provide a unified treatment of both simulation and
observational data sets. nbodykit can be easily deployed in a high performance
computing environment, overcoming some of the traditional difficulties of using
Python on supercomputers. We provide performance benchmarks illustrating the
scalability of the software. The modular, component-based approach of nbodykit
allows researchers to easily build complex applications using its tools. The
package is extensively documented at http://nbodykit.readthedocs.io, which also
includes an interactive set of example recipes for new users to explore. As
open-source software, we hope nbodykit provides a common framework for the
community to use and develop in confronting the analysis challenges of future
LSS surveys.Comment: 18 pages, 7 figures. Feedback very welcome. Code available at
https://github.com/bccp/nbodykit and for documentation, see
http://nbodykit.readthedocs.i
- …