5,330 research outputs found

    An events based algorithm for distributing concurrent tasks on multi-core architectures

    Get PDF
    In this paper, a programming model is presented which enables scalable parallel performance on multi-core shared memory architectures. The model has been developed for application to a wide range of numerical simulation problems. Such problems involve time stepping or iteration algorithms where synchronization of multiple threads of execution is required. It is shown that traditional approaches to parallelism including message passing and scatter-gather can be improved upon in terms of speed-up and memory management. Using spatial decomposition to create orthogonal computational tasks, a new task management algorithm called H-Dispatch is developed. This algorithm makes efficient use of memory resources by limiting the need for garbage collection and takes optimal advantage of multiple cores by employing a “hungry” pull strategy. The technique is demonstrated on a simple finite difference solver and results are compared to traditional MPI and scatter-gather approaches. The H-Dispatch approach achieves near linear speed-up with results for efficiency of 85% on a 24-core machine. It is noted that the H-Dispatch algorithm is quite general and can be applied to a wide class of computational tasks on heterogeneous architectures involving multi-core and GPGPU hardware.Schlumberger-Doll Research CenterSaudi Aramc

    CMOS Vision Sensors: Embedding Computer Vision at Imaging Front-Ends

    Get PDF
    CMOS Image Sensors (CIS) are key for imaging technol-ogies. These chips are conceived for capturing opticalscenes focused on their surface, and for delivering elec-trical images, commonly in digital format. CISs may incor-porate intelligence; however, their smartness basicallyconcerns calibration, error correction and other similartasks. The term CVISs (CMOS VIsion Sensors) definesother class of sensor front-ends which are aimed at per-forming vision tasks right at the focal plane. They havebeen running under names such as computational imagesensors, vision sensors and silicon retinas, among others. CVIS and CISs are similar regarding physical imple-mentation. However, while inputs of both CIS and CVISare images captured by photo-sensors placed at thefocal-plane, CVISs primary outputs may not be imagesbut either image features or even decisions based on thespatial-temporal analysis of the scenes. We may hencestate that CVISs are more “intelligent” than CISs as theyfocus on information instead of on raw data. Actually,CVIS architectures capable of extracting and interpretingthe information contained in images, and prompting reac-tion commands thereof, have been explored for years inacademia, and industrial applications are recently ramp-ing up.One of the challenges of CVISs architects is incorporat-ing computer vision concepts into the design flow. Theendeavor is ambitious because imaging and computervision communities are rather disjoint groups talking dif-ferent languages. The Cellular Nonlinear Network Univer-sal Machine (CNNUM) paradigm, proposed by Profs.Chua and Roska, defined an adequate framework forsuch conciliation as it is particularly well suited for hard-ware-software co-design [1]-[4]. This paper overviewsCVISs chips that were conceived and prototyped at IMSEVision Lab over the past twenty years. Some of them fitthe CNNUM paradigm while others are tangential to it. Allthem employ per-pixel mixed-signal processing circuitryto achieve sensor-processing concurrency in the quest offast operation with reduced energy budget.Junta de Andalucía TIC 2012-2338Ministerio de Economía y Competitividad TEC 2015-66878-C3-1-R y TEC 2015-66878-C3-3-

    DReAM: An approach to estimate per-Task DRAM energy in multicore systems

    Get PDF
    Accurate per-task energy estimation in multicore systems would allow performing per-task energy-aware task scheduling and energy-aware billing in data centers, among other applications. Per-task energy estimation is challenged by the interaction between tasks in shared resources, which impacts tasks’ energy consumption in uncontrolled ways. Some accurate mechanisms have been devised recently to estimate per-task energy consumed on-chip in multicores, but there is a lack of such mechanisms for DRAM memories. This article makes the case for accurate per-task DRAM energy metering in multicores, which opens new paths to energy/performance optimizations. In particular, the contributions of this article are (i) an ideal per-task energy metering model for DRAM memories; (ii) DReAM, an accurate yet low cost implementation of the ideal model (less than 5% accuracy error when 16 tasks share memory); and (iii) a comparison with standard methods (even distribution and access-count based) proving that DReAM is much more accurate than these other methods.Peer ReviewedPostprint (author's final draft

    High-Throughput Computing on High-Performance Platforms: A Case Study

    Full text link
    The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan---a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner

    Statistical Performance Analysis of an Ant-Colony Optimisation Application in S-NET

    Get PDF
    Kenneth MacKenzie, Philip K. F. Hölzenspies, Kevin Hammond, Raimund Kirner, Vu Thien Nga Nguyen, Iraneus te Boekhorst, Clemens Grelck, Raphael Poss, Merijn Verstraaten, 'Statistical Performance Analysis of an Ant-Colony Optimisation Application in S-NET'. Paper presented at the 2nd Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures. Berlin, Germany, 12 January 2013.We consider an ant-colony optimsation problem implemented on a multicore system as a collection of asynchronous stream- processing components under the control of the S-NET coordina- tion language. Statistical analysis and visualisation techniques are used to study the behaviour of the application, and this enables us to discover and correct problems in both the application program and the run-time system underlying S-NET

    Neural Simulations on Multi-Core Architectures

    Get PDF
    Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing
    corecore