6,106 research outputs found
Topology-aware GPU scheduling for learning workloads in cloud environments
Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments.
This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to â1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing
collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Unionâs Horizon
2020 research and innovation programme (grant agreement No 639595). It is
also partially supported by the Ministry of Economy of Spain under contract
TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051,
by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program
(SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef
and Asser Tantawi for the valuable discussions. We also thank SC17 committee
member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version
A short curriculum of the robotics and technology of computer lab
Our research Lab is directed by Prof. Anton Civit. It is an interdisciplinary group of 23
researchers that carry out their teaching and researching labor at the Escuela
PolitĂ©cnica Superior (Higher Polytechnic School) and the Escuela de IngenierĂa
InformĂĄtica (Computer Engineering School). The main research fields are: a)
Industrial and mobile Robotics, b) Neuro-inspired processing using electronic spikes,
c) Embedded and real-time systems, d) Parallel and massive processing computer
architecture, d) Information Technologies for rehabilitation, handicapped and elder
people, e) Web accessibility and usability
In this paper, the Lab history is presented and its main publications and research
projects over the last few years are summarized.Nuestro grupo de investigaciĂłn estĂĄ liderado por el profesor Civit. Somos un grupo
multidisciplinar de 23 investigadores que realizan su labor docente e investigadora
en la Escuela PolitĂ©cnica Superior y en Escuela de IngenierĂa InformĂĄtica. Las
principales lĂneas de investigaciones son: a) RobĂłtica industrial y mĂłvil. b)
Procesamiento neuro-inspirado basado en pulsos electrĂłnicos. c) Sistemas
empotrados y de tiempo real. d) Arquitecturas paralelas y de procesamiento masivo.
e) TecnologĂa de la informaciĂłn aplicada a la discapacidad, rehabilitaciĂłn y a las
personas mayores. f) Usabilidad y accesibilidad Web.
En este artĂculo se reseña la historia del grupo y se resumen las principales
publicaciones y proyectos que ha conseguido en los Ășltimos años
UEFI BIOS Accessibility for the Visually Impaired
People with some kind of disability face a high level of difficulty for
everyday tasks because, in many cases, accessibility was not considered
necessary when the task or process was designed. An example of this scenario is
a computer's BIOS configuration screens, which do not consider the specific
needs, such as screen readers, of visually impaired people. This paper proposes
the idea that it is possible to make the pre-operating system environment
accessible to visually impaired people. We report our work-in-progress in
creating a screen reader prototype, accessing audio cards compatible with the
High Definition Audio specification in systems running UEFI compliant firmware.Comment: 6 page
Tree Parity Machine Rekeying Architectures
The necessity to secure the communication between hardware components in
embedded systems becomes increasingly important with regard to the secrecy of
data and particularly its commercial use. We suggest a low-cost (i.e. small
logic-area) solution for flexible security levels and short key lifetimes. The
basis is an approach for symmetric key exchange using the synchronisation of
Tree Parity Machines. Fast successive key generation enables a key exchange
within a few milliseconds, given realistic communication channels with a
limited bandwidth. For demonstration we evaluate characteristics of a
standard-cell ASIC design realisation as IP-core in 0.18-micrometer
CMOS-technology
BRAHMS: Novel middleware for integrated systems computation
Biological computational modellers are becoming increasingly interested in building large, eclectic models, including components on many different computational substrates, both biological and non-biological. At the same time, the rise of the philosophy of embodied modelling is generating a need to deploy biological models as controllers for robots in real-world environments. Finally, robotics engineers are beginning to find value in seconding biomimetic control strategies for use on practical robots. Together with the ubiquitous desire to make good on past software development effort, these trends are throwing up new challenges of intellectual and technological integration (for example across scales, across disciplines, and even across time) - challenges that are unmet by existing software frameworks. Here, we outline these challenges in detail, and go on to describe a newly developed software framework, BRAHMS. that meets them. BRAHMS is a tool for integrating computational process modules into a viable, computable system: its generality and flexibility facilitate integration across barriers, such as those described above, in a coherent and effective way. We go on to describe several cases where BRAHMS has been successfully deployed in practical situations. We also show excellent performance in comparison with a monolithic development approach. Additional benefits of developing in the framework include source code self-documentation, automatic coarse-grained parallelisation, cross-language integration, data logging, performance monitoring, and will include dynamic load-balancing and 'pause and continue' execution. BRAHMS is built on the nascent, and similarly general purpose, model markup language, SystemML. This will, in future, also facilitate repeatability and accountability (same answers ten years from now), transparent automatic software distribution, and interfacing with other SystemML tools. (C) 2009 Elsevier Ltd. All rights reserved
Using XDAQ in Application Scenarios of the CMS Experiment
XDAQ is a generic data acquisition software environment that emerged from a
rich set of of use-cases encountered in the CMS experiment. They cover not the
deployment for multiple sub-detectors and the operation of different processing
and networking equipment as well as a distributed collaboration of users with
different needs. The use of the software in various application scenarios
demonstrated the viability of the approach. We discuss two applications, the
tracker local DAQ system for front-end commissioning and the muon chamber
validation system. The description is completed by a brief overview of XDAQ.Comment: Conference CHEP 2003 (Computing in High Energy and Nuclear Physics,
La Jolla, CA
A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures
In order to improve system performance efficiently, a number of systems
choose to equip multi-core and many-core processors (such as GPUs). Due to
their discrete memory these heterogeneous architectures comprise a distributed
system within a computer. A data-flow programming model is attractive in this
setting for its ease of expressing concurrency. Programmers only need to define
task dependencies without considering how to schedule them on the hardware.
However, mapping the resulting task graph onto hardware efficiently remains a
challenge. In this paper, we propose a graph-partition scheduling policy for
mapping data-flow workloads to heterogeneous hardware. According to our
experiments, our graph-partition-based scheduling achieves comparable
performance to conventional queue-base approaches.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and
Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241
- âŠ