Search CORE

1,545 research outputs found

The "MIND" Scalable PIM Architecture

Author: Brodowicz Maciej
Sterling Thomas
Publication venue
Publication date: 01/01/2005
Field of study

MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

Caltech Authors

Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

Author: Sterling Thomas L.
Zima Hans P.
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2000
Field of study

The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

CiteSeerX

Caltech Authors

EPOS : evolving personal to organizational knowledge spaces

Author: Bähr Jan-Thies
Dannenmann Peter
Hust Armin
Lauer Andreas
Maus Heiko
Schwarz Sven
van Elst Ludger
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/2004
Field of study

EPOS will leverage the user´s personal workspace with its manyfold native information structures to his personal knowledge space and in cooperation with other personal workspaces contribute to the organizational knowledge space which is represented in the organizational memory. This first milestone presents results from the project´s first year in the areas of the personal informational model, user observation for context elicitation, collaborative information retrieval and information visualization

A Phase Change Memory and DRAM Based Framework For Energy-Efficient and High-Speed In-Memory Stochastic Computing

Author: Mysore Supreeth
Publication venue: UKnowledge
Publication date: 01/01/2023
Field of study

Convolutional Neural Networks (CNNs) have proven to be highly effective in various fields related to Artificial Intelligence (AI) and Machine Learning (ML). However, the significant computational and memory requirements of CNNs make their processing highly compute and memory-intensive. In particular, the multiply-accumulate (MAC) operation, which is a fundamental building block of CNNs, requires enormous arithmetic operations. As the input dataset size increases, the traditional processor-centric von-Neumann computing architecture becomes ill-suited for CNN-based applications. This results in exponentially higher latency and energy costs, making the processing of CNNs highly challenging. To overcome these challenges, researchers have explored the Processing-In Memory (PIM) technique, which involves placing the processing unit inside or near the memory unit. This approach reduces data migration length and utilizes the internal memory bandwidth at the memory chip level. However, developing a reliable PIM-based system with minimal hardware modifications and design complexity remains a significant challenge. The proposed solution in the report suggests utilizing different memory technologies, such as Dynamic RAM (DRAM) and phase change memory (PCM), with Stochastic arithmetic and minimal add-on logic. Stochastic computing is a technique that uses random numbers to perform arithmetic operations instead of traditional binary representation. This technique reduces hardware requirements for CNN\u27s arithmetic operations, making it possible to implement them with minimal add-on logic. The report details the workflow for performing arithmetical operations used by CNNs, including MAC, activation, and floating-point functions. The proposed solution includes designs for scalable Stochastic Number Generator (SNG), DRAM CNN accelerator, non-volatile memory (NVM) class PCRAM-based CNN accelerator, and DRAM-based stochastic to binary conversion (StoB) for in-situ deep learning. These designs utilize stochastic computing to reduce the hardware requirements for CNN\u27s arithmetic operations and enable energy and time-efficient processing of CNNs. The report also identifies future research directions for the proposed designs, including in-situ PCRAM-based SNG, ODIN (A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM), ATRIA (Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing), and AGNI (In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning), and presents initial findings for these ideas. In summary, the proposed solution in the report offers a comprehensive approach to address the challenges of processing CNNs, and the proposed designs have the potential to improve the energy and time efficiency of CNNs significantly. Using Stochastic Computing and different memory technologies enables the development of reliable PIM-based systems with minimal hardware modifications and design complexity, providing a promising path for the future of CNN-based applications

University of Kentucky

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Author: Fernandez Ivan
Ghose Saugata
Gómez-Luna Juan
Mutlu Onur
Oliveira Geraldo F.
Orosa Lois
Sadrosadati Mohammad
Vijaykumar Nandita
Publication venue
Publication date: 01/01/2021
Field of study

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.Comment: Our open source software is available at https://github.com/CMU-SAFARI/DAMO

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Middleware services for distributed virtual environments

Author: Lu Fengyun
Publication venue: Newcastle University
Publication date: 01/01/2006
Field of study

PhD ThesisDistributed Virtual Environments (DVEs) are virtual environments which allow dispersed users to interact with each other and the virtual world through the underlying network. Scalability is a major challenge in building a successful DVE, which is directly affected by the volume of message exchange. Different techniques have been deployed to reduce the volume of message exchange in order to support large numbers of simultaneous participants in a DVE. Interest management is a popular technique for filtering unnecessary message exchange between users. The rationale behind interest management is to resolve the "interests" of users and decide whether messages should be exchanged between them. There are three basic interest management approaches: region-based, aura-based and hybrid approaches. However, if the time taken for an interest management approach to determine interests is greater than the duration of the interaction, it is not possible to guarantee interactions will occur correctly or at all. This is termed the Missed Interaction Problem, which all existing interest management approaches are susceptible to. This thesis provides a new aura-based interest management approach, termed Predictive Interest management (PIM), to alleviate the missed interaction problem. PIM uses an enlarged aura to detect potential aura-intersections and iii initiate message exchange. It utilises variable message exchange frequencies, proportional to the intersection degree of the objects' expanded auras, to restrict bandwidth usage. This thesis provides an experimental system, the PIM system, which couples predictive interest management with the de-centralised server communication model. It utilises the Common Object Request Broker Architecture (CORBA) middleware standard to provide an interoperable middleware for DVEs. Experimental results are provided to demonstrate that PIM provides a scalable interest management approach which alleviates the missed interaction problem

Newcastle University eTheses

OpenGrey Repository