Search CORE

824 research outputs found

PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications

Author: Cameron Kirk W.
Chang Hung-Ching
Feng Xizhou
Ge Rong
Li Dong
Song Shuaiwen
Publication venue: e-Publications@Marquette
Publication date: 01/11/2010
Field of study

Energy efficiency is a major concern in modern high-performance computing system design. In the past few years, there has been mounting evidence that power usage limits system scale and computing density, and thus, ultimately system performance. However, despite the impact of power and energy on the computer systems community, few studies provide insight to where and how power is consumed on high-performance systems and applications. In previous work, we designed a framework called PowerPack that was the first tool to isolate the power consumption of devices including disks, memory, NICs, and processors in a high-performance cluster and correlate these measurements to application functions. In this work, we extend our framework to support systems with multicore, multiprocessor-based nodes, and then provide in-depth analyses of the energy consumption of parallel applications on clusters of these systems. These analyses include the impacts of chip multiprocessing on power and energy efficiency, and its interaction with application executions. In addition, we use PowerPack to study the power dynamics and energy efficiencies of dynamic voltage and frequency scaling (DVFS) techniques on clusters. Our experiments reveal conclusively how intelligent DVFS scheduling can enhance system energy efficiency while maintaining performance

epublications@Marquette

PULP-HD: Accelerating Brain-Inspired High-Dimensional Computing on a Parallel Ultra-Low Power Platform

Author: Benatti Simone
Benini Luca
Montagna Fabio
Rahimi Abbas
Rossi Davide
Publication venue
Publication date: 01/01/2018
Field of study

Computing with high-dimensional (HD) vectors, also referred to as

\textit{hypervectors}

, is a brain-inspired alternative to computing with scalars. Key properties of HD computing include a well-defined set of arithmetic operations on hypervectors, generality, scalability, robustness, fast learning, and ubiquitous parallel operations. HD computing is about manipulating and comparing large patterns-binary hypervectors with 10,000 dimensions-making its efficient realization on minimalistic ultra-low-power platforms challenging. This paper describes HD computing's acceleration and its optimization of memory accesses and operations on a silicon prototype of the PULPv3 4-core platform (1.5mm

^2

, 2mW), surpassing the state-of-the-art classification accuracy (on average 92.4%) with simultaneous 3.7

\times

end-to-end speed-up and 2

\times

energy saving compared to its single-core execution. We further explore the scalability of our accelerator by increasing the number of inputs and classification window on a new generation of the PULP architecture featuring bit-manipulation instruction extensions and larger number of 8 cores. These together enable a near ideal speed-up of 18.4

\times

compared to the single-core PULPv3

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Memory performance of and-parallel prolog on shared-memory architectures

Author: Hermenegildo Manuel V.
Tick Evan
Publication venue: Facultad de Informática (UPM)
Publication date: 01/08/1988
Field of study

The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology

Archivo Digital UPM

A review on Reliability, Security and Memory Management of Numerous Operating Systems

Author: Al Karim Shafin
Marufuzzaman Mohammad
Rahman Md Saifur
Sidek Lariyah Mohd
Zahid Noor Mohammad
Publication venue: IAES Indonesia Section
Publication date: 25/09/2019
Field of study

With the improvement of technology and the growing needs of computer systems, it is needed to ensure that operating systems are able to provide the required functionalities. To provide these functionality operating systems are designed to maintain some design factors such as scalability, security, reliability, performance, memory management, energy efficiency. However, none of these factors can be achieved directly without facing any challenges. This research studied several design issues that are connected to each other in terms of providing an effective result. Therefore, this review article tried to reveal the major issues, which are independently more complex to solve at once. Finally, this research provides a guideline to overcome the challenges for future researchers by studying many research articles based on these design issues

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

Recommended from our members

A study of aspects of synchronisation and communication in certain parallel computer architectures

Author: Whitbread Martin John
Publication venue
Publication date: 01/01/1989
Field of study

This paper examines methods for synchronisation and communication between tasks in highly parallel arrays of processors. The development of various methods is researched and simulation techniques are applied to specific structures, to examine their effectiveness. Two approaches to simulation are presented, in the first case a discrete event simulator is applied to task synchronisation implemented with semaphores in a close coupled environment. Secondly the concurrent programming language Occam is used to simulate a systolic configuration of processors. In this case the design is verified, through actual system construction. Conclusions are drawn regarding the design disciplines and structure imposed by the use of these simulation techniques. A close relationship is found between the behaviour of a simulation written in Occam and the same structure constructed from multiple processors. Further research is suggested into the subject of dataflow processors, to find suitable means for simulating such systems, prior to implementation. A type of test vehicle is proposed that would operate a dataflow processor under the control of the development system

Open Research Online (The Open University)

Reconfiguration for Fault Tolerance and Performance Analysis

Author: Kollmeier Harold Henry
Publication venue: ScholarlyCommons
Publication date: 01/11/1987
Field of study

Architecture reconfiguration, the ability of a system to alter the active interconnection among modules, has a history of different purposes and strategies. Its purposes develop from the relatively simple desire to formalize procedures that all processes have in common to reconfiguration for the improvement of fault-tolerance, to reconfiguration for performance enhancement, either through the simple maximizing of system use or by sophisticated notions of wedding topology to the specific needs of a given process. Strategies range from straightforward redundancy by means of an identical backup system to intricate structures employing multistage interconnection networks. The present discussion surveys the more important contributions to developments in reconfigurable architecture. The strategy here is in a sense to approach the field from an historical perspective, with the goal of developing a more coherent theory of reconfiguration. First, the Turing and von Neumann machines are discussed from the perspective of system reconfiguration, and it is seen that this early important theoretical work contains little that anticipates reconfiguration. Then some early developments in reconfiguration are analyzed, including the work of Estrin and associates on the fixed plus variable restructurable computer system, the attempt to theorize about configurable computers by Miller and Cocke, and the work of Reddi and Feustel on their restructable computer system. The discussion then focuses on the most sustained systems for fault tolerance and performance enhancement that have been proposed. An attempt will be made to define fault tolerance and to investigate some of the strategies used to achieve it. By investigating four different systems, the Tandern computer, the C.vmp system, the Extra Stage Cube, and the Gamma network, the move from dynamic redundancy to reconfiguration is observed. Then reconfiguration for performance enhancement is discussed. A survey of some proposals is attempted, then the discussion focuses on the most sustained systems that have been proposed: PASM, the DC architecture, the Star local network, and the NYU Ultracomputer. The discussion is organized around a comparison of control, scheduling, communication, and network topology. Finally, comparisons are drawn between fault tolerance and performance enhancement, in order to clarify the notion of reconfiguration and to reveal the common ground of fault tolerance and performance enhancement as well as the areas in which they diverge. An attempt is made in the conclusion to derive from this survey and analysis some observations on the nature of reconfiguration, as well as some remarks on necessary further areas of research

CiteSeerX

ScholarlyCommons@Penn

Exploration of communication strategies for computation intensive Systems-On-Chip

Author: Deledda Antonio <1980>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 17/04/2008
Field of study

AMS Tesi di Dottorato

Laws for Communicating Parallel Processes

Author: Baker Henry
Hewitt Carl
Publication venue: MIT Artificial Intelligence Laboratory
Publication date: 01/11/1976
Field of study

Key Words and Phrases: parallel processes, parallel or asynchronous computations, partial orders of events, Actor theory. CR Categories: 5.21, 5.24, 5.26. This report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract N00014-75-C-0522.This paper presents some laws that must be satisfied by computations involving communicating parallel processes. The laws are stated in the context of the actor theory, a model for distributed parallel computation, and take the form of stating plausible restrictions on the histories of parallel computations to make them physically realizable. The laws are justified by appeal to physical intuition and are to be regarded as falsifiable assertions about the kinds of computations that occur in nature rather than as proven theorems in mathematics. The laws are used to analyze the mechanisms by which multiple processes can communicate to work effectively together to solve difficult problems. Since the causal relations among the events in a parallel computation do not specify a total order on events, the actor model generalizes the notion of computation from a sequence of states to a partial order of events. The interpretation of unordered events in this partial order is that they proceed concurrently. The utility of partial orders is demonstrated by using them to express our laws for distributed computation.MIT Artificial Intelligence Laboratory Department of Defense Advanced Research Projects Agenc

DSpace@MIT