Search CORE

39,129 research outputs found

The "MIND" Scalable PIM Architecture

Author: Brodowicz Maciej
Sterling Thomas
Publication venue
Publication date: 01/01/2005
Field of study

MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

Caltech Authors

Recommended from our members

UPC++ v1.0 Programmer’s Guide, Revision 2020.3.0

Author: Bachan J
Baden S
Bonachea Dan
Grossman J
Hargrove P
Hofmeyr S
Jacquelin M
Kamil A
Van Straalen B
Publication venue: eScholarship, University of California
Publication date: 12/03/2020
Field of study

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores

eScholarship - University of California

Recommended from our members

UPC++ v1.0 Specification, Revision 2020.3.0

Author: Bachan J
Bonachea Dan
Kamil A
Publication venue: eScholarship, University of California
Publication date: 12/03/2020
Field of study

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability

eScholarship - University of California

Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques

Author: Kaiser Hartmut
Khatami Zahra
Ramanujam J.
Publication venue
Publication date: 27/03/2017
Field of study

Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalances and waiting time due to memory latencies. Compiler optimization is one of the most effective solutions to tackle this problem. The compiler is able to detect the data dependencies in an application and is able to analyze the specific sections of code for parallelization potential. However, all of these techniques provided with a compiler are usually applied at compile time, so they rely on static analysis, which is insufficient for achieving maximum parallelism and producing desired application scalability. One solution to address this challenge is the use of runtime methods. This strategy can be implemented by delaying certain amount of code analysis to be done at runtime. In this research, we improve the parallel application performance generated by the OP2 compiler by leveraging HPX, a C++ runtime system, to provide runtime optimizations. These optimizations include asynchronous tasking, loop interleaving, dynamic chunk sizing, and data prefetching. The results of the research were evaluated using an Airfoil application which showed a 40-50% improvement in parallel performance.Comment: 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017

arXiv.org e-Print Archive

Crossref

Commercial-off-the-shelf simulation package interoperability: Issues and futures

Author: A. Dunkin
B. Johansson
Ke Pan
M. D. Rossetti
Navonil Mustafee
R. G. Ingalls
R. R. Hill
Simon J. E. Taylor
Steffen Strassburger
Stephen J. Turner
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2009
Field of study

Commercial-Off-The-Shelf Simulation Packages (CSPs) are widely used in industry to simulate discrete-event models. Interoperability of CSPs requires the use of distributed simulation techniques. Literature presents us with many examples of achieving CSP interoperability using bespoke solutions. However, for the wider adoption of CSP-based distributed simulation it is essential that, first and foremost, a standard for CSP interoperability be created, and secondly, these standards are adhered to by the CSP vendors. This advanced tutorial is on an emerging standard relating to CSP interoperability. It gives an overview of this standard and presents case studies that implement some of the proposed standards. Furthermore, interoperability is discussed in relation to large and complex models developed using CSPs that require large amount of computing resources. It is hoped that this tutorial will inform the simulation community of the issues associated with CSP interoperability, the importance of these standards and its future

CiteSeerX

Crossref

Brunel University Research Archive

Actors vs Shared Memory: two models at work on Big Data application frameworks

Author: Crafa Silvia
Tronchin Luca
Publication venue
Publication date: 01/01/2015
Field of study

This work aims at analyzing how two different concurrency models, namely the shared memory model and the actor model, can influence the development of applications that manage huge masses of data, distinctive of Big Data applications. The paper compares the two models by analyzing a couple of concrete projects based on the MapReduce and Bulk Synchronous Parallel algorithmic schemes. Both projects are doubly implemented on two concrete platforms: Akka Cluster and Managed X10. The result is both a conceptual comparison of models in the Big Data Analytics scenario, and an experimental analysis based on concrete executions on a cluster platform

arXiv.org e-Print Archive

CiteSeerX

Archivio istituzionale della ricerca - Università di Padova

Prototyping Formal System Models with Active Objects

Author: Hähnle Reiner
Kamburjan Eduard
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2018
Field of study

We propose active object languages as a development tool for formal system models of distributed systems. Additionally to a formalization based on a term rewriting system, we use established Software Engineering concepts, including software product lines and object orientation that come with extensive tool support. We illustrate our modeling approach by prototyping a weak memory model. The resulting executable model is modular and has clear interfaces between communicating participants through object-oriented modeling. Relaxations of the basic memory model are expressed as self-contained variants of a software product line. As a modeling language we use the formal active object language ABS which comes with an extensive tool set. This permits rapid formalization of core ideas, early validity checks in terms of formal invariant proofs, and debugging support by executing test runs. Hence, our approach supports the prototyping of formal system models with early feedback.Comment: In Proceedings ICE 2018, arXiv:1810.0205

arXiv.org e-Print Archive

TUbiblio

Digital Ecosystems: Ecosystem-Oriented Architectures

Author: Briscoe Gerard
De Wilde Philippe
Sadedin Suzanne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/08/2011
Field of study

We view Digital Ecosystems to be the digital counterparts of biological ecosystems. Here, we are concerned with the creation of these Digital Ecosystems, exploiting the self-organising properties of biological ecosystems to evolve high-level software applications. Therefore, we created the Digital Ecosystem, a novel optimisation technique inspired by biological ecosystems, where the optimisation works at two levels: a first optimisation, migration of agents which are distributed in a decentralised peer-to-peer network, operating continuously in time; this process feeds a second optimisation based on evolutionary computing that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant constraints. The Digital Ecosystem was then measured experimentally through simulations, with measures originating from theoretical ecology, evaluating its likeness to biological ecosystems. This included its responsiveness to requests for applications from the user base, as a measure of the ecological succession (ecosystem maturity). Overall, we have advanced the understanding of Digital Ecosystems, creating Ecosystem-Oriented Architectures where the word ecosystem is more than just a metaphor.Comment: 39 pages, 26 figures, journa

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

Kent Academic Repository