31,675 research outputs found

    Massively Parallel Video Networks

    Full text link
    We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and multi-rate clocks, these models perform a minimal amount of computation (e.g. as few as four convolutional layers) for each frame per timestep to produce an output. The models are still very deep, with dozens of such operations being performed but in a pipelined fashion that enables depth-parallel computation. We illustrate the proposed principles by applying them to existing image architectures and analyse their behaviour on two video tasks: action recognition and human keypoint localisation. The results show that a significant degree of parallelism, and implicitly speedup, can be achieved with little loss in performance.Comment: Fixed typos in densenet model definition in appendi

    A massively parallel multi-level approach to a domain decomposition method for the optical flow estimation with varying illumination

    Get PDF
    We consider a variational method to solve the optical flow problem with varying illumination. We apply an adaptive control of the regularization parameter which allows us to preserve the edges and fine features of the computed flow. To reduce the complexity of the estimation for high resolution images and the time of computations, we implement a multi-level parallel approach based on the domain decomposition with the Schwarz overlapping method. The second level of parallelism uses the massively parallel solver MUMPS. We perform some numerical simulations to show the efficiency of our approach and to validate it on classical and real-world image sequences

    Adaptive control in rollforward recovery for extreme scale multigrid

    Full text link
    With the increasing number of compute components, failures in future exa-scale computer systems are expected to become more frequent. This motivates the study of novel resilience techniques. Here, we extend a recently proposed algorithm-based recovery method for multigrid iterations by introducing an adaptive control. After a fault, the healthy part of the system continues the iterative solution process, while the solution in the faulty domain is re-constructed by an asynchronous on-line recovery. The computations in both the faulty and healthy subdomains must be coordinated in a sensitive way, in particular, both under and over-solving must be avoided. Both of these waste computational resources and will therefore increase the overall time-to-solution. To control the local recovery and guarantee an optimal re-coupling, we introduce a stopping criterion based on a mathematical error estimator. It involves hierarchical weighted sums of residuals within the context of uniformly refined meshes and is well-suited in the context of parallel high-performance computing. The re-coupling process is steered by local contributions of the error estimator. We propose and compare two criteria which differ in their weights. Failure scenarios when solving up to 6.9â‹…10116.9\cdot10^{11} unknowns on more than 245\,766 parallel processes will be reported on a state-of-the-art peta-scale supercomputer demonstrating the robustness of the method

    nbodykit: an open-source, massively parallel toolkit for large-scale structure

    Get PDF
    We present nbodykit, an open-source, massively parallel Python toolkit for analyzing large-scale structure (LSS) data. Using Python bindings of the Message Passing Interface (MPI), we provide parallel implementations of many commonly used algorithms in LSS. nbodykit is both an interactive and scalable piece of scientific software, performing well in a supercomputing environment while still taking advantage of the interactive tools provided by the Python ecosystem. Existing functionality includes estimators of the power spectrum, 2 and 3-point correlation functions, a Friends-of-Friends grouping algorithm, mock catalog creation via the halo occupation distribution technique, and approximate N-body simulations via the FastPM scheme. The package also provides a set of distributed data containers, insulated from the algorithms themselves, that enable nbodykit to provide a unified treatment of both simulation and observational data sets. nbodykit can be easily deployed in a high performance computing environment, overcoming some of the traditional difficulties of using Python on supercomputers. We provide performance benchmarks illustrating the scalability of the software. The modular, component-based approach of nbodykit allows researchers to easily build complex applications using its tools. The package is extensively documented at http://nbodykit.readthedocs.io, which also includes an interactive set of example recipes for new users to explore. As open-source software, we hope nbodykit provides a common framework for the community to use and develop in confronting the analysis challenges of future LSS surveys.Comment: 18 pages, 7 figures. Feedback very welcome. Code available at https://github.com/bccp/nbodykit and for documentation, see http://nbodykit.readthedocs.i
    • …
    corecore