Search CORE

822 research outputs found

Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids

Author: Chen Qifeng
Koltun Vladlen
Publication venue
Publication date: 12/04/2016
Field of study

We present a global optimization approach to optical flow estimation. The approach optimizes a classical optical flow objective over the full space of mappings between discrete grids. No descriptor matching is used. The highly regular structure of the space of mappings enables optimizations that reduce the computational complexity of the algorithm's inner loop from quadratic to linear and support efficient matching of tens of thousands of nodes to tens of thousands of displacements. We show that one-shot global optimization of a classical Horn-Schunck-type objective over regular grids at a single resolution is sufficient to initialize continuous interpolation and achieve state-of-the-art performance on challenging modern benchmarks.Comment: To be presented at CVPR 201

arXiv.org e-Print Archive

Crossref

Speeding up neighborhood search in local Gaussian process prediction

Author: Gramacy Robert B.
Haaland Benjamin
Publication venue
Publication date: 05/01/2015
Field of study

Recent implementations of local approximate Gaussian process models have pushed computational boundaries for non-linear, non-parametric prediction problems, particularly when deployed as emulators for computer experiments. Their flavor of spatially independent computation accommodates massive parallelization, meaning that they can handle designs two or more orders of magnitude larger than previously. However, accomplishing that feat can still require massive supercomputing resources. Here we aim to ease that burden. We study how predictive variance is reduced as local designs are built up for prediction. We then observe how the exhaustive and discrete nature of an important search subroutine involved in building such local designs may be overly conservative. Rather, we suggest that searching the space radially, i.e., continuously along rays emanating from the predictive location of interest, is a far thriftier alternative. Our empirical work demonstrates that ray-based search yields predictors with accuracy comparable to exhaustive search, but in a fraction of the time - bringing a supercomputer implementation back onto the desktop.Comment: 24 pages, 5 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

FigShare

Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets

Author: Anderson Michael J.
Capotă Mihai
Chen Po-Hsuan
Manning Jeremy R.
Norman Kenneth A.
Ramadge Peter J.
Turek Javier S.
Wang Yida
Willke Theodore L.
Zhu Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

On-the-fly tracing for data-centric computing : parallelization, workflow and applications

Author: Jiang Lei
Publication venue: LSU Digital Commons
Publication date: 01/01/2013
Field of study

As data-centric computing becomes the trend in science and engineering, more and more hardware systems, as well as middleware frameworks, are emerging to handle the intensive computations associated with big data. At the programming level, it is crucial to have corresponding programming paradigms for dealing with big data. Although MapReduce is now a known programming model for data-centric computing where parallelization is completely replaced by partitioning the computing task through data, not all programs particularly those using statistical computing and data mining algorithms with interdependence can be re-factorized in such a fashion. On the other hand, many traditional automatic parallelization methods put an emphasis on formalism and may not achieve optimal performance with the given limited computing resources. In this work we propose a cross-platform programming paradigm, called on-the-fly data tracing , to provide source-to-source transformation where the same framework also provides the functionality of workflow optimization on larger applications. Using a big-data approximation computations related to large-scale data input are identified in the code and workflow and a simplified core dependence graph is built based on the computational load taking in to account big data. The code can then be partitioned into sections for efficient parallelization; and at the workflow level, optimization can be performed by adjusting the scheduling for big-data considerations, including the I/O performance of the machine. Regarding each unit in both source code and workflow as a model, this framework enables model-based parallel programming that matches the available computing resources. The techniques used in model-based parallel programming as well as the design of the software framework for both parallelization and workflow optimization as well as its implementations with multiple programming languages are presented in the dissertation. Then, the following experiments are performed to validate the framework: i) the benchmarking of parallelization speed-up using typical examples in data analysis and machine learning (e.g. naive Bayes, k-means) and ii) three real-world applications in data-centric computing with the framework are also described to illustrate the efficiency: pattern detection from hurricane and storm surge simulations, road traffic flow prediction and text mining from social media data. In the applications, it illustrates how to build scalable workflows with the framework along with performance enhancements

Louisiana State University

An OpenMP based Parallelization Compiler for C Applications

Author: Hamid Arabnejad
Jorge G. Barbosa
João Bispo
João M. P. Cardoso
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Directive-drive programming models, such as OpenMP, are one solution for exploiting the potential of multi-core architectures, and enable developers to accelerate software applications by adding annotations on for-type loops and other code regions. However, manual parallelization of applications is known to be a non trivial and time consuming process, requiring parallel programming skills. Automatic parallelization approaches can reduce the burden on the application development side. This paper presents an OpenMP based automatic parallelization compiler, named AutoPar-Clava, for automatic identification and annotation of loops in C code. By using static analysis, parallelizable regions are detected, and a compilable OpenMP parallel code from the sequential version is produced. In order to reduce the accesses to shared memory by each thread, each variable is categorized into the proper OpenMP scoping. Also, AutoPar-Clava is able to support reduction on arrays, which is available since OpenMP 4.5. The effectiveness of AutoPar-Clava is evaluated by means of the Polyhedral Benchmark suite, and targeting a N-cores x86-based computing platform. The achieved results are very promising and compare favorably with closely related auto-parallelization compilers such as Intel C/C++ Compiler (i.e., icc), ROSE, TRACO, and Cetus

Repositório Aberto da Universidade do Porto

Scaling and universality in the phase diagram of the 2D Blume-Capel model

We review the pertinent features of the phase diagram of the zero-field Blume-Capel model, focusing on the aspects of transition order, finite-size scaling and universality. In particular, we employ a range of Monte Carlo simulation methods to study the 2D spin-1 Blume-Capel model on the square lattice to investigate the behavior in the vicinity of the first-order and second-order regimes of the ferromagnet-paramagnet phase boundary, respectively. To achieve high-precision results, we utilize a combination of (i) a parallel version of the multicanonical algorithm and (ii) a hybrid updating scheme combining Metropolis and generalized Wolff cluster moves. These techniques are combined to study for the first time the correlation length of the model, using its scaling in the regime of second-order transitions to illustrate universality through the observed identity of the limiting value of

\xi/L

with the exactly known result for the Ising universality class.Comment: 16 pages, 7 figures, 1 table, submitted to Eur. Phys. J. Special Topic

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Coventry University Pure Portal

Haloes gone MAD: The Halo-Finder Comparison Project

Author: Agertz
Alexander Knebe
Anatoly Klypin
Ascasibar
Ascasibar
Aubert
Bagla
Barnes
Bertschinger
Binney
Bode
Bridget L. Falck
Bullock
Cameron McBride
Chung-Hsing Hsu
Cohn
Couchman
Courtin
Daniel Ceverino
Davis
Dekel
Diemand
Doug Potter
Doumler
Dubinski
Dylan Tweed
Eisenstein
Fabrice Roy
Francesca Iannuzzi
Frazer R. Pearce
Frenk
Fryxell
Gardner
Gardner
Gelb
Gill
Gnedin
Gottlöber
Gottlöber
Governato
Greg Stinson
Gustavo Yepes
Habib
Hayashi
Heitmann
Hernquist
Hernquist
Jeff Gardner
Jenkins
Joachim Stadel
Juerg Diemand
Justin I. Read
Kazantzidis
Kim
Klaus Dolag
Klimentowski
Klypin
Knebe
Knebe
Knollmann
Kravtsov
Kruskal
Lacey
Lukić
Lukić
Lux
Maciejewski
Marcel Zemp
Mark C. Neyrinck
Merz
Michal Maciejewski
Miguel Angel Aragon-Calvo
Muldrew
Navarro
Navarro
Navarro
Navarro
Neyrinck
O’Shea
P. M. Sutter
Patricia Fasel
Paul M. Ricker
Pen
Peter S. Behroozi
Pfitzner
Planelles
Plummer
Press
Quilis
Read
Read
Robertson
Schaap
Sharma
Shaw
Sheth
Shiloach
Skory
Springel
Springel
Springel
Springel
Stadel
Stefan Gottlöber
Steffen R. Knollmann
Stephane Colombi
Stuart I. Muldrew
Susana Planelles
Sutter
Tasker
Teyssier
Tinker
Tweed
van Kampen
Vicent Quilis
Victor Turchaninov
Volker Springel
Warren
Weller
White
White
Yago Ascasibar
Yann Rasera
Zarija Lukić
Zemp
Łokas
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

[abridged] We present a detailed comparison of fundamental dark matter halo properties retrieved by a substantial number of different halo finders. These codes span a wide range of techniques including friends-of-friends (FOF), spherical-overdensity (SO) and phase-space based algorithms. We further introduce a robust (and publicly available) suite of test scenarios that allows halo finder developers to compare the performance of their codes against those presented here. This set includes mock haloes containing various levels and distributions of substructure at a range of resolutions as well as a cosmological simulation of the large-scale structure of the universe. All the halo finding codes tested could successfully recover the spatial location of our mock haloes. They further returned lists of particles (potentially) belonging to the object that led to coinciding values for the maximum of the circular velocity profile and the radius where it is reached. All the finders based in configuration space struggled to recover substructure that was located close to the centre of the host halo and the radial dependence of the mass recovered varies from finder to finder. Those finders based in phase space could resolve central substructure although they found difficulties in accurately recovering its properties. Via a resolution study we found that most of the finders could not reliably recover substructure containing fewer than 30-40 particles. However, also here the phase space finders excelled by resolving substructure down to 10-20 particles. By comparing the halo finders using a high resolution cosmological volume we found that they agree remarkably well on fundamental properties of astrophysical significance (e.g. mass, position, velocity, and peak of the rotation curve).Comment: 27 interesting pages, 20 beautiful figures, and 4 informative tables accepted for publication in MNRAS. The high-resolution version of the paper as well as all the test cases and analysis can be found at the web site http://popia.ft.uam.es/HaloesGoingMA

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Surrey Research Insight

MPG.PuRe

Hal-Diderot

arXiv.org e-Print Archive

Repository for Publications and Research Data

Deep Blue Documents at the University of Michigan

Leicester Research Archive

Biblos-e Archivo

A Similarity Measure for GPU Kernel Subgraph Matching

Author: A Sabne
BP Miller
C Böhm
F Zhang
G Ammons
L Adhianto
MH Williams
R Lim
R Singh
RC Gonzales
SS Shende
T Ball
Publication venue
Publication date: 21/03/2019
Field of study

Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a kernel. CUDAflow is a tool that statically separates CUDA binaries into basic block regions and dynamically measures instruction and basic block frequencies. CUDAflow captures this information in a control flow graph (CFG) and performs subgraph matching across various kernel's CFGs to gain insights to an application's resource requirements, based on the shape and traversal of the graph, instruction operations executed and registers allocated, among other information. The utility of CUDAflow is demonstrated with SHOC and Rodinia application case studies on a variety of GPU architectures, revealing novel thread divergence characteristics that facilitates end users, autotuners and compilers in generating high performing code

arXiv.org e-Print Archive

Crossref

A dynamic programming approach for quickly estimating large network-based MEV models

Author: BASTIN Fabian
FOSGEREAU Mogens
FREJINGER Emma
MAI Tien
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

fals

Institutional Knowledge at Singapore Management University

PolyPublie

Online Research Database In Technology