6,106 research outputs found

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version

    A short curriculum of the robotics and technology of computer lab

    Get PDF
    Our research Lab is directed by Prof. Anton Civit. It is an interdisciplinary group of 23 researchers that carry out their teaching and researching labor at the Escuela PolitĂ©cnica Superior (Higher Polytechnic School) and the Escuela de IngenierĂ­a InformĂĄtica (Computer Engineering School). The main research fields are: a) Industrial and mobile Robotics, b) Neuro-inspired processing using electronic spikes, c) Embedded and real-time systems, d) Parallel and massive processing computer architecture, d) Information Technologies for rehabilitation, handicapped and elder people, e) Web accessibility and usability In this paper, the Lab history is presented and its main publications and research projects over the last few years are summarized.Nuestro grupo de investigaciĂłn estĂĄ liderado por el profesor Civit. Somos un grupo multidisciplinar de 23 investigadores que realizan su labor docente e investigadora en la Escuela PolitĂ©cnica Superior y en Escuela de IngenierĂ­a InformĂĄtica. Las principales lĂ­neas de investigaciones son: a) RobĂłtica industrial y mĂłvil. b) Procesamiento neuro-inspirado basado en pulsos electrĂłnicos. c) Sistemas empotrados y de tiempo real. d) Arquitecturas paralelas y de procesamiento masivo. e) TecnologĂ­a de la informaciĂłn aplicada a la discapacidad, rehabilitaciĂłn y a las personas mayores. f) Usabilidad y accesibilidad Web. En este artĂ­culo se reseña la historia del grupo y se resumen las principales publicaciones y proyectos que ha conseguido en los Ășltimos años

    UEFI BIOS Accessibility for the Visually Impaired

    Full text link
    People with some kind of disability face a high level of difficulty for everyday tasks because, in many cases, accessibility was not considered necessary when the task or process was designed. An example of this scenario is a computer's BIOS configuration screens, which do not consider the specific needs, such as screen readers, of visually impaired people. This paper proposes the idea that it is possible to make the pre-operating system environment accessible to visually impaired people. We report our work-in-progress in creating a screen reader prototype, accessing audio cards compatible with the High Definition Audio specification in systems running UEFI compliant firmware.Comment: 6 page

    Tree Parity Machine Rekeying Architectures

    Get PDF
    The necessity to secure the communication between hardware components in embedded systems becomes increasingly important with regard to the secrecy of data and particularly its commercial use. We suggest a low-cost (i.e. small logic-area) solution for flexible security levels and short key lifetimes. The basis is an approach for symmetric key exchange using the synchronisation of Tree Parity Machines. Fast successive key generation enables a key exchange within a few milliseconds, given realistic communication channels with a limited bandwidth. For demonstration we evaluate characteristics of a standard-cell ASIC design realisation as IP-core in 0.18-micrometer CMOS-technology

    BRAHMS: Novel middleware for integrated systems computation

    Get PDF
    Biological computational modellers are becoming increasingly interested in building large, eclectic models, including components on many different computational substrates, both biological and non-biological. At the same time, the rise of the philosophy of embodied modelling is generating a need to deploy biological models as controllers for robots in real-world environments. Finally, robotics engineers are beginning to find value in seconding biomimetic control strategies for use on practical robots. Together with the ubiquitous desire to make good on past software development effort, these trends are throwing up new challenges of intellectual and technological integration (for example across scales, across disciplines, and even across time) - challenges that are unmet by existing software frameworks. Here, we outline these challenges in detail, and go on to describe a newly developed software framework, BRAHMS. that meets them. BRAHMS is a tool for integrating computational process modules into a viable, computable system: its generality and flexibility facilitate integration across barriers, such as those described above, in a coherent and effective way. We go on to describe several cases where BRAHMS has been successfully deployed in practical situations. We also show excellent performance in comparison with a monolithic development approach. Additional benefits of developing in the framework include source code self-documentation, automatic coarse-grained parallelisation, cross-language integration, data logging, performance monitoring, and will include dynamic load-balancing and 'pause and continue' execution. BRAHMS is built on the nascent, and similarly general purpose, model markup language, SystemML. This will, in future, also facilitate repeatability and accountability (same answers ten years from now), transparent automatic software distribution, and interfacing with other SystemML tools. (C) 2009 Elsevier Ltd. All rights reserved

    Using XDAQ in Application Scenarios of the CMS Experiment

    Full text link
    XDAQ is a generic data acquisition software environment that emerged from a rich set of of use-cases encountered in the CMS experiment. They cover not the deployment for multiple sub-detectors and the operation of different processing and networking equipment as well as a distributed collaboration of users with different needs. The use of the software in various application scenarios demonstrated the viability of the approach. We discuss two applications, the tracker local DAQ system for front-end commissioning and the muon chamber validation system. The description is completed by a brief overview of XDAQ.Comment: Conference CHEP 2003 (Computing in High Energy and Nuclear Physics, La Jolla, CA

    A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

    Full text link
    In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within a computer. A data-flow programming model is attractive in this setting for its ease of expressing concurrency. Programmers only need to define task dependencies without considering how to schedule them on the hardware. However, mapping the resulting task graph onto hardware efficiently remains a challenge. In this paper, we propose a graph-partition scheduling policy for mapping data-flow workloads to heterogeneous hardware. According to our experiments, our graph-partition-based scheduling achieves comparable performance to conventional queue-base approaches.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241
    • 

    corecore