Search CORE

112,615 research outputs found

A program-driven parallel machine simulation environment

Author: Chou Chien-chun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

[[abstract]]In recent years, it has been very popular to employ discrete-event simulation as a hardware architecture analytical tool to study distributed-memory multicomputers and shared-memory multiprocessors. After the hardware architecture prototype has been completed, a complete and detailed machine simulation environment can be utilized to evaluate the architecture's efficiency under real operating systems and application software. In this article, we discuss all the development and implementation of a program-executable Transputer network multicomputer as well as 80x86 series multiprocessors, and how they can be operated. On another level, owing to the extreme complexity of the simulated computer systems, parallel discrete-event simulation has also been used to shorten the time of running the simulation. In practice, this simulator can solve problems through a network connection with many workstations. Some of the workstations may be in charge of computing, while others can be responsible for the management of memory, thus making it simpler to establish a parallel machine simulation environment. In addition to providing an environment for programs to execute on it, such a simulator also calculates the time spent in running these programs, so as to evaluate the feasibility for these application programs to run on a hardware system.[[conferencetype]]國際[[conferencedate]]19981214~19981216[[conferencelocation]]Tainan, Taiwa

Tamkang University Institutional Repository

Performance Debugging and Tuning using an Instruction-Set Simulator

Author: Magnusson Peter S.
Montelius Johan
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/1997
Field of study

Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably. We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its rôle as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we've used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Thermal System Oriented Simulation of Aircraft Electrical Environmental Control Systems Including its Electric Coupling

Author: Ablanque Mejía Nicolás
Oliet Casasayas Carles
Pérez Segarra Carlos David
Rigola Serrano Joaquim
Torras Ortiz Santiago
Publication venue: European Conference for AeroSpace Sciences (EUCASS)
Publication date: 01/01/2019
Field of study

A flexible numerical platform based on libraries has been developed within the Dymola/Modelica framework to simulate Environmental Control Systems (ECS). The goal was to build up a flexible tool to analyse complex systems including their thermal and electrical perimeters at both steady and transient conditions focusing on three key characteristics: numerical robustness, optimal time consumption, and high accuracy. This document aims to underline both the most relevant features of the numerical tool and the main challenges addressed during its development. Some illustrative simulations are shown in order to highlight the tool capabilities.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Memory and information processing in neuromorphic systems

Author: Indiveri Giacomo
Liu Shih-Chii
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

A striking difference between brain-inspired neuromorphic processors and current von Neumann processors architectures is the way in which memory and processing is organized. As Information and Communication Technologies continue to address the need for increased computational power through the increase of cores within a digital processor, neuromorphic engineers and scientists can complement this need by building processor architectures where memory is distributed with the processing. In this paper we present a survey of brain-inspired processor architectures that support models of cortical networks and deep neural networks. These architectures range from serial clocked implementations of multi-neuron systems to massively parallel asynchronous ones and from purely digital systems to mixed analog/digital systems which implement more biological-like models of neurons and synapses together with a suite of adaptation and learning mechanisms analogous to the ones found in biological nervous systems. We describe the advantages of the different approaches being pursued and present the challenges that need to be addressed for building artificial neural processing systems that can display the richness of behaviors seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed neuromorphic computing platforms and system

arXiv.org e-Print Archive

ZORA

Topology-aware GPU scheduling for learning workloads in cloud environments

Author: Amaral Marcelo
Carrera David
Polo Bardés Jordà
Seelam Seetharami
Steinder Malgorzata
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2017
Field of study

Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Parallel machine architecture and compiler design facilities

Author: Kuck David J.
Padua David
Sameh Ahmed
Veidenbaum Alex
Yew Pen-Chung
Publication venue
Publication date
Field of study

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role

NASA Technical Reports Server

Simulation of networks of spiking neurons: A review of tools and strategies

Author: Beeman D.
Boustani S. El
Bower J. M.
Brette R.
Carnevale T.
Davison A. P.
Destexhe A.
Diesmann M.
Djurfeldt M.
Ermentrout B.
Goodman P. H.
Harris Jr. F. C.
Hines M.
Lansner A.
Morrison A.
Muller E.
Natschlager T.
Pecevski D.
Rochel O.
Rudolph M.
Vieville T.
Zirpe M.
Publication venue
Publication date: 01/01/2007
Field of study

We review different aspects of the simulation of spiking neural networks. We start by reviewing the different types of simulation strategies and algorithms that are currently implemented. We next review the precision of those simulation strategies, in particular in cases where plasticity depends on the exact timing of the spikes. We overview different simulators and simulation environments presently available (restricted to those freely available, open source and documented). For each simulation tool, its advantages and pitfalls are reviewed, with an aim to allow the reader to identify which simulator is appropriate for a given task. Finally, we provide a series of benchmark simulations of different types of networks of spiking neurons, including Hodgkin-Huxley type, integrate-and-fire models, interacting with current-based or conductance-based synapses, using clock-driven or event-driven integration strategies. The same set of models are implemented on the different simulators, and the codes are made available. The ultimate goal of this review is to provide a resource to facilitate identifying the appropriate integration strategy and simulation tool to use for a given modeling problem related to spiking neural networks.Comment: 49 pages, 24 figures, 1 table; review article, Journal of Computational Neuroscience, in press (2007

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

PubMed Central

Juelich Shared Electronic Resources

HAL-Ecole des Ponts ParisTech

Cross-level sensor network simulation with COOJA

Author: Dunkels Adam
Eriksson Joakim
Finne Niclas
Voigt Thiemo
Österlind Fredrik
Publication venue
Publication date: 01/01/2006
Field of study

Simulators for wireless sensor networks are a valuable tool for system development. However, current simulators can only simulate a single level of a system at once. This makes system development and evolution difficult since developers cannot use the same simulator for both high-level algorithm development and low-level development such as device-driver implementations. We propose cross-level simulation, a novel type of wireless sensor network simulation that enables holistic simultaneous simulation at different levels. We present an implementation of such a simulator, COOJA, a simulator for the Contiki sensor node operating system. COOJA allows for simultaneous simulation at the network level, the operating system level, and the machine code instruction set level. With COOJA, we show the feasibility of the cross-level simulation approach

CiteSeerX

Crossref

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

RELEASE: A High-level Paradigm for Reliable Large-scale Server Software

Author: A. Leung
C. Hewitt
D. Dewolfs
D. Ungar
G. Agha
G. Germain
H. Rajan
J. Zhao
K. Sagonas
L. Seiler
M. Snir
R. Chandra
R.K. Karmani
S. Srinivasan
T. Arts
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the first six months. The project aim is to scale the Erlang’s radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the effectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene

CiteSeerX

Crossref

Kent Academic Repository

Janus II: a new generation application-driven computer for spin-system simulations

This paper describes the architecture, the development and the implementation of Janus II, a new generation application-driven number cruncher optimized for Monte Carlo simulations of spin systems (mainly spin glasses). This domain of computational physics is a recognized grand challenge of high-performance computing: the resources necessary to study in detail theoretical models that can make contact with experimental data are by far beyond those available using commodity computer systems. On the other hand, several specific features of the associated algorithms suggest that unconventional computer architectures, which can be implemented with available electronics technologies, may lead to order of magnitude increases in performance, reducing to acceptable values on human scales the time needed to carry out simulation campaigns that would take centuries on commercially available machines. Janus II is one such machine, recently developed and commissioned, that builds upon and improves on the successful JANUS machine, which has been used for physics since 2008 and is still in operation today. This paper describes in detail the motivations behind the project, the computational requirements, the architecture and the implementation of this new machine and compares its expected performances with those of currently available commercial systems.Comment: 28 pages, 6 figure

arXiv.org e-Print Archive

Docta Complutense

Crossref

Directory of Open Access Journals

La Colmena

Turismo y patrimonio (E-Journal)

Archivio istituzionale della ricerca - Università di Ferrara

DIALNET

Archivio della ricerca- Università di Roma La Sapienza