534,555 research outputs found

    GPU Utilization: Predictive SARIMAX Time Series Analysis

    Get PDF
    This work explores collecting performance metrics and leveraging the output for prediction on a memory-intensive parallel image classification algorithm - Inception v3 (or Inception3 ). Experimental results were collected by nvidia-smi on a computational node DGX-1, equipped with eight Tesla V100 Graphic Processing Units (GPUs). Time series analysis was performed on the GPU utilization data taken, for multiple runs, of Inception3’s image classification algorithm (see Figure 1). The time series model applied was Seasonal Autoregressive Integrated Moving Average Exogenous (SARIMAX)

    High-performance computing and communication models for solving the complex interdisciplinary problems on DPCS

    Get PDF
    The paper presents some advanced high performance (HPC) and parallel computing (PC) methodologies for solving a large space complex problem involving the integrated difference research areas. About eight interdisciplinary problems will be accurately solved on multiple computers communicating over the local area network. The mathematical modeling and a large sparse simulation of the interdisciplinary effort involve the area of science, engineering, biomedical, nanotechnology, software engineering, agriculture, image processing and urban planning. The specific methodologies of PC software under consideration include PVM, MPI, LUNA, MDC, OpenMP, CUDA and LINDA integrated with COMSOL and C++/C. There are different communication models of parallel programming, thus some definitions of parallel processing, distributed processing and memory types are explained for understanding the main contribution of this paper. The matching between the methodology of PC and the large sparse application depends on the domain of solution, the dimension of the targeted area, computational and communication pattern, the architecture of distributed parallel computing systems (DPCS), the structure of computational complexity and communication cost. The originality of this paper lies in obtaining the complex numerical model dealing with a large scale partial differential equation (PDE), discretization of finite difference (FDM) or finite element (FEM) methods, numerical simulation, high-performance simulation and performance measurement. The simulation of PDE will perform by sequential and parallel algorithms to visualize the complex model in high-resolution quality. In the context of a mathematical model, various independent and dependent parameters present the complex and real phenomena of the interdisciplinary application. As a model executes, these parameters can be manipulated and changed. As an impact, some chemical or mechanical properties can be predicted based on the observation of parameter changes. The methodologies of parallel programs build on the client-server model, slave-master model and fragmented model. HPC of the communication model for solving the interdisciplinary problems above will be analyzed using a flow of the algorithm, numerical analysis and the comparison of parallel performance evaluations. In conclusion, the integration of HPC, communication model, PC software, performance and numerical analysis happens to be an important approach to fulfill the matching requirement and optimize the solution of complex interdisciplinary problems

    Scalable Data Parallel Algorithms for Texture Synthesis and Compression using Gibbs Random Fields

    Get PDF
    This paper introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov Random Field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood parameter estimation, currently implemented on Thinking Machines CM-2 and CM-5. Use of fine-grained, data parallel processing techniques yields real-time algorithms for texture synthesis and compression that are substantially faster than the previously known sequential implementations. Although current implementations are on Connection Machines, the methodology presented here enables machine independent scalable algorithms for a number of problems in image processing and analysis. (Also cross-referenced as UMIACS-TR-93-80.

    Accelerating Scientific Computing Models Using GPU Processing

    Get PDF
    GPGPUs offer significant computational power for programmers to leverage. This computational power is especially useful when utilized for accelerating scientific models. This thesis analyzes the utilization of GPGPU programming to accelerate scientific computing models. First the construction of hardware for visualization and computation of scientific models is discussed. Several factors in the construction of the machines focus on the performance impacts related to scientific modeling. Image processing is an embarrassingly parallel problem well suited for GPGPU acceleration. An image processing library was developed to show the processes of recognizing embarrassingly parallel problems and serves as an excellent example of converting from a serial CPU implementation to a GPU accelerated implementation. Genetic algorithms are biologically inspired heuristic search algorithms based on natural selection. The Tetris genetic algorithm with A* pathfinding discusses memory bound limitations that can prevent direct algorithm conversions from the CPU to the GPU. An analysis of an existing landscape evolution model, CHILD, for GPU acceleration explores that even when a model shows promise for GPU acceleration, the underlying data structures can have a significant impact upon that ability to move to a GPU implementation. CHILD also offers an example of creating tighter MATLAB integration between existing models. Lastly, a parallel spatial sorting algorithm is discussed as a possible replacement for current spatial sorting algorithms implemented in models such as smoothed particle hydrodynamics

    A parallel generalized relaxation method for high-performance image segmentation on GPUs

    Get PDF
    Fast and scalable software modules for image segmentation are needed for modern high-throughput screening platforms in Computational Biology. Indeed, accurate segmentation is one of the main steps to be applied in a basic software pipeline aimed to extract accurate measurements from a large amount of images. Image segmentation is often formulated through a variational principle, where the solution is the minimum of a suitable functional, as in the case of the Ambrosio–Tortorelli model. Euler–Lagrange equations associated with the above model are a system of two coupled elliptic partial differential equations whose finite-difference discretization can be efficiently solved by a generalized relaxation method, such as Jacobi or Gauss–Seidel, corresponding to a first-order alternating minimization scheme. In this work we present a parallel software module for image segmentation based on the Parallel Sparse Basic Linear Algebra Subprograms (PSBLAS), a general-purpose library for parallel sparse matrix computations, using its Graphics Processing Unit (GPU) extensions that allow us to exploit in a simple and transparent way the performance capabilities of both multi-core CPUs and of many-core GPUs. We discuss performance results in terms of execution times and speed-up of the segmentation module running on GPU as well as on multi-core CPUs, in the analysis of 2D gray-scale images of mouse embryonic stem cells colonies coming from biological experiment

    Attentive processing improves object recognition

    Get PDF
    The human visual system can recognize several thousand object categories irrespective of their position and size. This combination of selectivity and invariance is built up gradually across several stages of visual processing. However, the recognition of multiple objects in cluttered visual scenes presents a difficult problem for human as well as machine vision systems. The human visual system has evolved to perform two stages of visual processing: a pre-attentive parallel processing stage, in which the entire visual field is processed at once and a slow serial attentive processing stage, in which aregion of interest in an input image is selected for "specialized" analysis by an attentional spotlight. We argue that this strategy evolved to overcome the limitation of purely feed forward processing in the presence of clutter and crowding. Using a Bayesian model of attention along with a hierarchical model of feed forward recognition on a data set of real world images, we show that this two stage attentive processing can improve recognition in cluttered and crowded conditions

    Efficient Fully Bayesian Approach to Brain Activity Mapping with Complex-Valued fMRI Data

    Full text link
    Functional magnetic resonance imaging (fMRI) enables indirect detection of brain activity changes via the blood-oxygen-level-dependent (BOLD) signal. Conventional analysis methods mainly rely on the real-valued magnitude of these signals. In contrast, research suggests that analyzing both real and imaginary components of the complex-valued fMRI (cv-fMRI) signal provides a more holistic approach that can increase power to detect neuronal activation. We propose a fully Bayesian model for brain activity mapping with cv-fMRI data. Our model accommodates temporal and spatial dynamics. Additionally, we propose a computationally efficient sampling algorithm, which enhances processing speed through image partitioning. Our approach is shown to be computationally efficient via image partitioning and parallel computation while being competitive with state-of-the-art methods. We support these claims with both simulated numerical studies and an application to real cv-fMRI data obtained from a finger-tapping experiment
    • …
    corecore