Search CORE

63 research outputs found

Dynamic Trace-Based Data Dependency Analysis for Parallelization of C Programs

Author: Lavagno Luciano
Lazarescu Mihai Teodor
Publication venue: IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: [email protected], INTERNET: http://www.ieee.org, Fax: (732)981-9667
Publication date: 01/01/2012
Field of study

Writing parallel code is traditionally considered a difficult task, even when it is tackled from the beginning of a project. In this paper, we demonstrate an innovative toolset that faces this challenge directly. It provides the software developers with profile data and directs them to possible top-level, pipeline-style parallelization opportunities for an arbitrary sequential C program. This approach is complementary to the methods based on static code analysis and automatic code rewriting and does not impose restrictions on the structure of the sequential code or the parallelization style, even though it is mostly aimed at coarse-grained task-level parallelization. The proposed toolset has been utilized to define parallel code organizations for a number of real-world representative applications and is based on and is provided as free source

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

The "MIND" Scalable PIM Architecture

Author: Brodowicz Maciej
Sterling Thomas
Publication venue
Publication date: 01/01/2005
Field of study

MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

Caltech Authors

Interactive Trace-Based Analysis Toolset for Manual Parallelization of C Programs

Author: Luciano Lavagno
Mihai T. Lazarescu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Massive amounts of legacy sequential code need to be parallelized to make better use of modern multiprocessor architectures. Nevertheless, writing parallel programs is still a difficult task. Automated parallelization methods can be effective both at the statement and loop levels and, recently, at the task level, but they are still restricted to specific source code constructs or application domains. We present in this article an innovative toolset that supports developers when performing manual code analysis and parallelization decisions. It automatically collects and represents the program profile and data dependencies in an interactive graphical format that facilitates the analysis and discovery of manual parallelization opportunities. The toolset can be used for arbitrary sequential C programs and parallelization patterns. Also, its program-scope data dependency tracing at runtime can complement the tools based on static code analysis and can also benefit from it at the same time. We also tested the effectiveness of the toolset in terms of time to reach parallelization decisions and of their quality. We measured a significant improvement for several real-world representative applications

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Disk-Directed I/O for MIMD Multiprocessors

Author: Kotz David
Publication venue: Dartmouth Digital Commons
Publication date: 01/11/1994
Field of study

Many scientific applications that run on today\u27s multiprocessors are bottlenecked by their file I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and improved file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, \em disk-directed I/O, that flips the usual relationship between server and client to allow the disks (actually, disk servers) to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, and close to the maximum disk bandwidth

Dartmouth Digital Commons (Dartmouth College)

A type-checking preprocessor for Cilk 2, a multithreaded C language

Author: Miller Robert C. (Robert Chisolm)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 37-38).by Robert C. Miller.M.Eng

DSpace@MIT

Disk-directed I/O for MIMD Multiprocessors

Author: Kotz David
Publication venue: Dartmouth Digital Commons
Publication date: 08/11/1994
Field of study

Many scientific applications that run on today\u27s multiprocessors are bottlenecked by their file I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and improved file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, that flips the usual relationship between server and client to allow the disks (actually, disk servers) to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, and close to the maximum disk bandwidth

Dartmouth Digital Commons (Dartmouth College)

Translation techniques for distributed-shared memory programming models

Author: Fuller Douglas James
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2005
Field of study

This thesis argues that a modular, source-to-source translation system for distributed-shared memory programming models would be beneficial to the high-performance computing community. It goes on to present a proof-of-concept example in detail, translating between Global Arrays (GA) and Unified Parallel C (UPC). Some useful extensions to UPC are discussed, along with how they are implemented in the proof-of-concept translator

Digital Repository @ Iowa State University (ISU)

UNT Digital Library

Memory coherence activity prediction in commercial workloads

Author: Ailamaki Anastassia
Falsafi Babak
Hardavellas Nikolaos
Kim Jangwoo
Somogyi Stephen
Wenisch Thomas F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/01/2009
Field of study

Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memory multiprocessors. Important commercial applications also show sensitivity to coherence latency, which will become more acute in the future as technology scales. Therefore it is important to investigate prediction of memory coherence activity in the context of commercial workloads.This paper studies a trace-based Downgrade Predictor (DGP) for predicting last stores to shared cache blocks, and a pattern-based Consumer Set Predictor (CSP) for predicting subsequent readers. We evaluate this class of predictors for the first time on commercial applications and demonstrate that our DGP correctly predicts 47%-76% of last stores. Memory sharing patterns in commercial workloads are inherently non-repetitive; hence CSP cannot attain high coverage. We perform an opportunity study of a DGP enhanced through competitive underlying predictors, and in commercial and scientific applications, demonstrate potential to increase coverage up to 14%

Infoscience - École polytechnique fédérale de Lausanne

Desafíos en el diseño de sistemas Ciber-Físicos

Author: Chandy John C.
Publication venue: Universidad San Buenaventura - USB (Colombia)
Publication date: 01/12/2010
Field of study

Los sistemas cyber-físicos ─Cyber-Physical Systems CPS─ es un proceso que integra la computación con los procesos físicos. Los computadores embebidos, el monitoreo de redes y el control de procesos físicos, usualmente tienen ciclos de retroalimentación en los que los procesos físicos afectan los cálculos, y viceversa. En este artículo se examinan los desafíos en el diseño de estos sistemas, y se plantea la cuestión de si la informática y las tecnologías de redes actuales proporcionan una base adecuada para ellos. La conclusión es que para mejorar los procesos de diseño de estos sistemas no será suficiente con elevar el nivel de abstracción o verificar, formalmente o no, los diseños en los que se basan las abstracciones de hoy. El potencial social y económico de los CPS es mucho mayor de lo que hasta el momento se ha pensado; en todo el mundo se están realizando grandes inversiones para desarrollar esta tecnología, pero los retos son considerables. Para aprovechar todo el potencial de los CPS se tendrán que reconstruir los procesos de las abstracciones informáticas y de las redes, y los procesos se deberán acoger en pleno a los principios de las dinámicas físicas y de la computación

Directory of Open Access Journals

Universidad de San Buenaventura, sede Bogotá: Editorial Bonaventuriana

Sorting on Clusters of SMPs

Author: Helman David R.
JaJa Joseph
Publication venue
Publication date: 01/01/1998
Field of study

Clusters of symmetric multiprocessors (SMPs) have emerged as the primary candidates for large scale multiprocessor systems. In this paper, we introduce an efficient sorting algorithm for clusters of SMPs. This algorithm relies on a novel scheme for stably sorting on a single SMP coupled with balanced regular communication on the cluster. Our SMP algorithm seems to be asymptotically faster than any of the published algorithms we are aware of. The algorithms were implemented in C using Posix Threads and the SIMPLE library of communication primitives and run on a cluster of DEC AlphaServer 2100A systems. Our experimental results verify the scalability and efficiency of our proposed solution and illustrate the importance of considering both memory hierarchy and the overhead of shifting to multiple nodes. (Also cross-reference as UMIACS-TR-97-6

CiteSeerX

Digital Repository at the University of Maryland