18,581 research outputs found

    GraphLab: A New Framework for Parallel Machine Learning

    Full text link
    Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

    A New MHD Code with Adaptive Mesh Refinement and Parallelization for Astrophysics

    Full text link
    A new code, named MAP, is written in Fortran language for magnetohydrodynamics (MHD) calculation with the adaptive mesh refinement (AMR) and Message Passing Interface (MPI) parallelization. There are several optional numerical schemes for computing the MHD part, namely, modified Mac Cormack Scheme (MMC), Lax-Friedrichs scheme (LF) and weighted essentially non-oscillatory (WENO) scheme. All of them are second order, two-step, component-wise schemes for hyperbolic conservative equations. The total variation diminishing (TVD) limiters and approximate Riemann solvers are also equipped. A high resolution can be achieved by the hierarchical block-structured AMR mesh. We use the extended generalized Lagrange multiplier (EGLM) MHD equations to reduce the non-divergence free error produced by the scheme in the magnetic induction equation. The numerical algorithms for the non-ideal terms, e.g., the resistivity and the thermal conduction, are also equipped in the MAP code. The details of the AMR and MPI algorithms are described in the paper.Comment: 44 pages, 16 figure

    Byzantine Approximate Agreement on Graphs

    Get PDF
    Consider a distributed system with n processors out of which f can be Byzantine faulty. In the approximate agreement task, each processor i receives an input value x_i and has to decide on an output value y_i such that 1) the output values are in the convex hull of the non-faulty processors\u27 input values, 2) the output values are within distance d of each other. Classically, the values are assumed to be from an m-dimensional Euclidean space, where m >= 1. In this work, we study the task in a discrete setting, where input values with some structure expressible as a graph. Namely, the input values are vertices of a finite graph G and the goal is to output vertices that are within distance d of each other in G, but still remain in the graph-induced convex hull of the input values. For d=0, the task reduces to consensus and cannot be solved with a deterministic algorithm in an asynchronous system even with a single crash fault. For any d >= 1, we show that the task is solvable in asynchronous systems when G is chordal and n > (omega+1)f, where omega is the clique number of G. In addition, we give the first Byzantine-tolerant algorithm for a variant of lattice agreement. For synchronous systems, we show tight resilience bounds for the exact variants of these and related tasks over a large class of combinatorial structures

    Novel methods for real-time 3D facial recognition

    Get PDF
    In this paper we discuss our approach to real-time 3D face recognition. We argue the need for real time operation in a realistic scenario and highlight the required pre- and post-processing operations for effective 3D facial recognition. We focus attention to some operations including face and eye detection, and fast post-processing operations such as hole filling, mesh smoothing and noise removal. We consider strategies for hole filling such as bilinear and polynomial interpolation and Laplace and conclude that bilinear interpolation is preferred. Gaussian and moving average smoothing strategies are compared and it is shown that moving average can have the edge over Gaussian smoothing. The regions around the eyes normally carry a considerable amount of noise and strategies for replacing the eyeball with a spherical surface and the use of an elliptical mask in conjunction with hole filling are compared. Results show that the elliptical mask with hole filling works well on face models and it is simpler to implement. Finally performance issues are considered and the system has demonstrated to be able to perform real-time 3D face recognition in just over 1s 200ms per face model for a small database

    Hydrologic Terrain Processing Using Parallel Computing

    Get PDF
    Abstract: Topography in the form of Digital Elevation Models (DEMs), is widely used to derive information for the modeling of hydrologic processes. Hydrologic terrain analysis augments the information content of digital elevation data by removing spurious pits, deriving a structured flow field, and calculating surfaces of hydrologic information derived from the flow field. The increasing availability of large terrain datasets with very small ground sample distance (GSD) poses a challenge for existing algorithms that process terrain data to extract this hydrologic information. This paper will describe a parallel algorithm that has been developed to enhance hydrologic terrain pre-processing so that larger datasets can be more efficiently computed. This paper describes a Message Passing Interface (MPI) parallel implementation for Pit Removal. This key functionality is used within the Terrain Analysis Using Digital Elevation Models (TauDEM) package to remove spurious elevation depressions that are an artifact of the raster representation of the terrain. The parallel algorithm works by decomposing the domain into stripes or tiles where each tile is processed by a separate processor. This method also reduces the memory requirements of each processor so that larger size grids can be processed. The parallel pit removal algorithm is adapted from the method of Planchon and Darboux that starts from a large elevation then iteratively scans the grid, lowering each grid cell to the maximum of the original elevation or the lowest neighbor. The MPI implementation reconcile

    Mapping constrained optimization problems to quantum annealing with application to fault diagnosis

    Get PDF
    Current quantum annealing (QA) hardware suffers from practical limitations such as finite temperature, sparse connectivity, small qubit numbers, and control error. We propose new algorithms for mapping boolean constraint satisfaction problems (CSPs) onto QA hardware mitigating these limitations. In particular we develop a new embedding algorithm for mapping a CSP onto a hardware Ising model with a fixed sparse set of interactions, and propose two new decomposition algorithms for solving problems too large to map directly into hardware. The mapping technique is locally-structured, as hardware compatible Ising models are generated for each problem constraint, and variables appearing in different constraints are chained together using ferromagnetic couplings. In contrast, global embedding techniques generate a hardware independent Ising model for all the constraints, and then use a minor-embedding algorithm to generate a hardware compatible Ising model. We give an example of a class of CSPs for which the scaling performance of D-Wave's QA hardware using the local mapping technique is significantly better than global embedding. We validate the approach by applying D-Wave's hardware to circuit-based fault-diagnosis. For circuits that embed directly, we find that the hardware is typically able to find all solutions from a min-fault diagnosis set of size N using 1000N samples, using an annealing rate that is 25 times faster than a leading SAT-based sampling method. Further, we apply decomposition algorithms to find min-cardinality faults for circuits that are up to 5 times larger than can be solved directly on current hardware.Comment: 22 pages, 4 figure
    • …
    corecore