49 research outputs found

    Cross-Layer System Design for Autonomous Driving

    Full text link
    Autonomous driving has gained tremendous popularity and becomes one of the most emerging applications recently, which allows the vehicle to drive by itself without requiring help from a human. The demand of this application continues to grow leading to ever increasing investment from industry in the last decade. Unfortunately, autonomous driving systems remain unavailable to the public and are still under development even with the recent considerable advancement achieved in our community. Several key challenges are observed across the stack of autonomous driving systems and must be addressed to bridge the gap. This dissertation investigates cross-layer autonomous driving systems from hardware architecture, software algorithms to human-vehicle interaction. In the hardware architecture layer, we investigate and present the design constraints of autonomous driving systems. With an end-to-end autonomous driving system framework we built, we accelerate the computational bottlenecks identified and thoroughly investigate the implications and trade-offs across various accelerator platforms. In the software algorithm layer, we propose an accelerating technique for object recognition, which is one of the critical bottlenecks in autonomous driving systems. We exploit the similarity across frames in streaming videos for autonomous vehicles and reuse the intermediate outputs computed in the algorithm to reduce the computation required and improve the performance. In the human-vehicle interaction layer, we design a conversational in-vehicle interface framework which enables drivers to interact with vehicles by using natural human language to improve the usability of autonomous driving features. We also integrate this framework into a commercially available vehicle and conduct a real-world driving study.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/149951/1/shihclin_1.pd

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Computer Science & Technology Series : XIX Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC’13 was the nineteenth Congress in the CACIC series. It was organized by the Department of Computer Systems at the CAECE University in Mar del Plata. The Congress included 13 Workshops with 165 accepted papers, 5 Conferences, 3 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 5 courses. CACIC 2013 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 247 submissions. An average of 2.5 review reports were collected for each paper, for a grand total of 676 review reports that involved about 210 different reviewers. A total of 165 full papers, involving 489 authors and 80 Universities, were accepted and 25 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

    Compiler-centric across-stack deep learning acceleration

    Get PDF
    Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a holistic approach, including a range of topics from machine learning, systems, and hardware. The rapid proliferation of deep learning applications has raised demand for efficient exploration and acceleration of deep learning based solutions. However, managing the range of optimization techniques, as well as how they interact with each other across the stack is a non-trivial task. A family of emerging specialized compilers for deep learning, tensor compilers, appear to be a strong candidate to help manage the complexity of across-stack optimization choices, and enable new approaches. This thesis presents new techniques and explorations of the Deep Learning Acceleration Stack (DLAS), with the perspective that the tensor compiler will increasingly be the center of this stack. First, we motivate the challenges in exploring DLAS, by describing the experience of running a perturbation study varying parameters at every layer of the stack. The core of the study is implemented using a tensor compiler, which reduces the complexity of evaluating the wide range of variants, although still requires a significant engineering effort to realize. Next, we develop a new algorithm for grouped convolution, a model optimization technique for which existing solutions provided poor inference time scaling. We implement and optimize our algorithm using a tensor compiler, outperforming existing approaches by 5.1× on average (arithmetic mean). Finally, we propose a technique, transfer-tuning, to reduce the search time required for automatic tensor compiler code optimization, reducing the search time required by 6.5× on average. The techniques and contributions of this thesis across these interconnected domains demonstrate the exciting potential of tensor compilers to simplify and improve design space exploration for DNNs, and their deployment. The outcomes of this thesis enable new lines of research to enable machine learning developers to keep up with the rapidly evolving landscape of neural architectures and hardware

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Electronic Systems with High Energy Efficiency for Embedded Computer Vision

    Get PDF
    Electronic systems are now widely adopted in everyday use. Moreover, nowadays there is an extensive use of embedded wearable and portable devices from industrial to consumer applications. The growing demand of embedded devices and applications has opened several new research fields due to the need of low power consumption and real time responsiveness. Focusing on this class of devices, computer vision algorithms are a challenging application target. In embedded computer vision hardware and software design have to interact to meet application specific requirements. The focus of this thesis is to study computer vision algorithms for embedded systems. The presented work starts presenting a novel algorithm for an IoT stationary use case targeting a high-end embedded device class, where power can be supplied to the platform through wires. Moreover, further contributions focus on algorithmic design and optimization on low and ultra-low power devices. Solutions are presented to gesture recognition and context change detection for wearable devices, focusing on first person wearable devices (Ego-Centric Vision), with the aim to exploit more constrained systems in terms of available power budget and computational resources. A novel gesture recognition algorithm is presented that improves state of art approaches. We then demonstrate the effectiveness of low resolution images exploitation in context change detection with real world ultra-low power imagers. The last part of the thesis deals with more flexible software models to support multiple applications linked at runtime and executed on Cortex-M device class, supporting critical isolation features typical of virtualization-ready CPUs on low-cost low-power microcontrollers and covering some defects in security and deployment capabilities of current firmwares
    corecore