7,685 research outputs found

    Probabilistic Image Models and their Massively Parallel Architectures : A Seamless Simulation- and VLSI Design-Framework Approach

    Get PDF
    Algorithmic robustness in real-world scenarios and real-time processing capabilities are the two essential and at the same time contradictory requirements modern image-processing systems have to fulfill to go significantly beyond state-of-the-art systems. Without suitable image processing and analysis systems at hand, which comply with the before mentioned contradictory requirements, solutions and devices for the application scenarios of the next generation will not become reality. This issue would eventually lead to a serious restraint of innovation for various branches of industry. This thesis presents a coherent approach to the above mentioned problem. The thesis at first describes a massively parallel architecture template and secondly a seamless simulation- and semiconductor-technology-independent design framework for a class of probabilistic image models, which are formulated on a regular Markovian processing grid. The architecture template is composed of different building blocks, which are rigorously derived from Markov Random Field theory with respect to the constraints of \it massively parallel processing \rm and \it technology independence\rm. This systematic derivation procedure leads to many benefits: it decouples the architecture characteristics from constraints of one specific semiconductor technology; it guarantees that the derived massively parallel architecture is in conformity with theory; and it finally guarantees that the derived architecture will be suitable for VLSI implementations. The simulation-framework addresses the unique hardware-relevant simulation needs of MRF based processing architectures. Furthermore the framework ensures a qualified representation for simulation of the image models and their massively parallel architectures by means of their specific simulation modules. This allows for systematic studies with respect to the combination of numerical, architectural, timing and massively parallel processing constraints to disclose novel insights into MRF models and their hardware architectures. The design-framework rests upon a graph theoretical approach, which offers unique capabilities to fulfill the VLSI demands of massively parallel MRF architectures: the semiconductor technology independence guarantees a technology uncommitted architecture for several design steps without restricting the design space too early; the design entry by means of behavioral descriptions allows for a functional representation without determining the architecture at the outset; and the topology-synthesis simplifies and separates the data- and control-path synthesis. Detailed results discussed in the particular chapters together with several additional results collected in the appendix will further substantiate the claims made in this thesis

    Bio-inspired broad-class phonetic labelling

    Get PDF
    Recent studies have shown that the correct labeling of phonetic classes may help current Automatic Speech Recognition (ASR) when combined with classical parsing automata based on Hidden Markov Models (HMM).Through the present paper a method for Phonetic Class Labeling (PCL) based on bio-inspired speech processing is described. The methodology is based in the automatic detection of formants and formant trajectories after a careful separation of the vocal and glottal components of speech and in the operation of CF (Characteristic Frequency) neurons in the cochlear nucleus and cortical complex of the human auditory apparatus. Examples of phonetic class labeling are given and the applicability of the method to Speech Processing is discussed

    PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

    Full text link
    High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    Handwritten digit recognition by bio-inspired hierarchical networks

    Full text link
    The human brain processes information showing learning and prediction abilities but the underlying neuronal mechanisms still remain unknown. Recently, many studies prove that neuronal networks are able of both generalizations and associations of sensory inputs. In this paper, following a set of neurophysiological evidences, we propose a learning framework with a strong biological plausibility that mimics prominent functions of cortical circuitries. We developed the Inductive Conceptual Network (ICN), that is a hierarchical bio-inspired network, able to learn invariant patterns by Variable-order Markov Models implemented in its nodes. The outputs of the top-most node of ICN hierarchy, representing the highest input generalization, allow for automatic classification of inputs. We found that the ICN clusterized MNIST images with an error of 5.73% and USPS images with an error of 12.56%

    Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

    Get PDF
    Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches — CUDA and OpenCL — are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper is built on top of the OpenCL standard and offers preimplemented recurring computation and communication patterns (skeletons) which greatly simplify programming for multiGPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between the system’s main memory and multiple GPUs. In this paper, we focus on the specific support in SkelCL for systems with multiple GPUs and use a real-world application study from the area of medical imaging to demonstrate the reduced programming effort and competitive performance of SkelCL as compared to OpenCL and CUDA. Besides, we illustrate how SkelCL adapts to large-scale, distributed heterogeneous systems in order to simplify their programming
    • 

    corecore