Search CORE

18,581 research outputs found

GraphLab: A New Framework for Parallel Machine Learning

Author: Bickson Danny
Gonzalez Joseph
Guestrin Carlos
Hellerstein Joseph M.
Kyrola Aapo
Low Yucheng
Publication venue
Publication date: 01/01/2010
Field of study

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

arXiv.org e-Print Archive

CiteSeerX

A New MHD Code with Adaptive Mesh Refinement and Parallelization for Astrophysics

Author: Almgren
Balsara
Balsara
Berger
Berger
Brackbill
Brio
Butcher
C. Fang
Chen
Colella
Dai
Dedner
DeZeeuw
Evans
Fromang
Fryxell
Gardiner
Gardiner
Gurski
Harten
Hilbert
Jiang
Jiang
Jiang
Jiang
Keppens
MacNeice
Matsumoto
Mignone
Miniati
Miyoshi
P.-F. Chen
Powell
R.-L. Jiang
Roe
Roe
Tóth
Tóth
van der Holst
Yu
Zhang
Ziegler
Ziegler
Publication venue: 'Elsevier BV'
Publication date: 26/04/2012
Field of study

A new code, named MAP, is written in Fortran language for magnetohydrodynamics (MHD) calculation with the adaptive mesh refinement (AMR) and Message Passing Interface (MPI) parallelization. There are several optional numerical schemes for computing the MHD part, namely, modified Mac Cormack Scheme (MMC), Lax-Friedrichs scheme (LF) and weighted essentially non-oscillatory (WENO) scheme. All of them are second order, two-step, component-wise schemes for hyperbolic conservative equations. The total variation diminishing (TVD) limiters and approximate Riemann solvers are also equipped. A high resolution can be achieved by the hierarchical block-structured AMR mesh. We use the extended generalized Lagrange multiplier (EGLM) MHD equations to reduce the non-divergence free error produced by the scheme in the magnetic induction equation. The numerical algorithms for the non-ideal terms, e.g., the resistivity and the thermal conduction, are also equipped in the MAP code. The details of the AMR and MPI algorithms are described in the paper.Comment: 44 pages, 16 figure

arXiv.org e-Print Archive

Crossref

Byzantine Approximate Agreement on Graphs

Author: Nowak Thomas
Rybicki Joel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Distributed Computing (DISC 2019)
Publication date: 01/01/2019
Field of study

Consider a distributed system with n processors out of which f can be Byzantine faulty. In the approximate agreement task, each processor i receives an input value x_i and has to decide on an output value y_i such that 1) the output values are in the convex hull of the non-faulty processors\u27 input values, 2) the output values are within distance d of each other. Classically, the values are assumed to be from an m-dimensional Euclidean space, where m >= 1. In this work, we study the task in a discrete setting, where input values with some structure expressible as a graph. Namely, the input values are vertices of a finite graph G and the goal is to output vertices that are within distance d of each other in G, but still remain in the graph-induced convex hull of the input values. For d=0, the task reduces to consensus and cannot be solved with a deterministic algorithm in an asynchronous system even with a single crash fault. For any d >= 1, we show that the task is solvable in asynchronous systems when G is chordal and n > (omega+1)f, where omega is the clique number of G. In addition, we give the first Byzantine-tolerant algorithm for a variant of lattice agreement. For synchronous systems, we show tight resilience bounds for the exact variants of these and related tasks over a large class of combinatorial structures

HAL-CentraleSupelec

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

IST Austria: PubRep (Institute of Science and Technology)

Hal-Diderot

HAL-Rennes 1

Novel methods for real-time 3D facial recognition

Author: Robinson Alan
Rodrigues Marcos
Publication venue: 'Athens Institute for Education and Research ATINER'
Publication date: 01/01/2010
Field of study

In this paper we discuss our approach to real-time 3D face recognition. We argue the need for real time operation in a realistic scenario and highlight the required pre- and post-processing operations for effective 3D facial recognition. We focus attention to some operations including face and eye detection, and fast post-processing operations such as hole filling, mesh smoothing and noise removal. We consider strategies for hole filling such as bilinear and polynomial interpolation and Laplace and conclude that bilinear interpolation is preferred. Gaussian and moving average smoothing strategies are compared and it is shown that moving average can have the edge over Gaussian smoothing. The regions around the eyes normally carry a considerable amount of noise and strategies for replacing the eyeball with a spherical surface and the use of an elliptical mask in conjunction with hole filling are compared. Results show that the elliptical mask with hole filling works well on face models and it is simpler to implement. Finally performance issues are considered and the system has demonstrated to be able to perform real-time 3D face recognition in just over 1s 200ms per face model for a small database

Sheffield Hallam University Research Archive

Hydrologic Terrain Processing Using Parallel Computing

Author: Schreuders Kimberly A. T.
Tarboton David G.
Tesfa Teklu K.
Wallace R.
Wallis C.
Watson Daniel W.
Publication venue
Publication date: 01/01/2009
Field of study

Abstract: Topography in the form of Digital Elevation Models (DEMs), is widely used to derive information for the modeling of hydrologic processes. Hydrologic terrain analysis augments the information content of digital elevation data by removing spurious pits, deriving a structured flow field, and calculating surfaces of hydrologic information derived from the flow field. The increasing availability of large terrain datasets with very small ground sample distance (GSD) poses a challenge for existing algorithms that process terrain data to extract this hydrologic information. This paper will describe a parallel algorithm that has been developed to enhance hydrologic terrain pre-processing so that larger datasets can be more efficiently computed. This paper describes a Message Passing Interface (MPI) parallel implementation for Pit Removal. This key functionality is used within the Terrain Analysis Using Digital Elevation Models (TauDEM) package to remove spurious elevation depressions that are an artifact of the raster representation of the terrain. The parallel algorithm works by decomposing the domain into stripes or tiles where each tile is processed by a separate processor. This method also reduces the memory requirements of each processor so that larger size grids can be processed. The parallel pit removal algorithm is adapted from the method of Planchon and Darboux that starts from a large elevation then iteratively scans the grid, lowering each grid cell to the maximum of the original elevation or the lowest neighbor. The MPI implementation reconcile

CiteSeerX

DigitalCommons@USU

Mapping constrained optimization problems to quantum annealing with application to fault diagnosis

Author: Bian Zhengbing
Chudak Fabian
Israel Robert
Lackey Brad
Macready William G.
Roy Aidan
Publication venue
Publication date: 01/01/2016
Field of study

Current quantum annealing (QA) hardware suffers from practical limitations such as finite temperature, sparse connectivity, small qubit numbers, and control error. We propose new algorithms for mapping boolean constraint satisfaction problems (CSPs) onto QA hardware mitigating these limitations. In particular we develop a new embedding algorithm for mapping a CSP onto a hardware Ising model with a fixed sparse set of interactions, and propose two new decomposition algorithms for solving problems too large to map directly into hardware. The mapping technique is locally-structured, as hardware compatible Ising models are generated for each problem constraint, and variables appearing in different constraints are chained together using ferromagnetic couplings. In contrast, global embedding techniques generate a hardware independent Ising model for all the constraints, and then use a minor-embedding algorithm to generate a hardware compatible Ising model. We give an example of a class of CSPs for which the scaling performance of D-Wave's QA hardware using the local mapping technique is significantly better than global embedding. We validate the approach by applying D-Wave's hardware to circuit-based fault-diagnosis. For circuits that embed directly, we find that the hardware is typically able to find all solutions from a min-fault diagnosis set of size N using 1000N samples, using an annealing rate that is 25 times faster than a leading SAT-based sampling method. Further, we apply decomposition algorithms to find min-cardinality faults for circuits that are up to 5 times larger than can be solved directly on current hardware.Comment: 22 pages, 4 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector