90 research outputs found

    Reconfigurable Computing Systems for Robotics using a Component-Oriented Approach

    Get PDF
    Robotic platforms are becoming more complex due to the wide range of modern applications, including multiple heterogeneous sensors and actuators. In order to comply with real-time and power-consumption constraints, these systems need to process a large amount of heterogeneous data from multiple sensors and take action (via actuators), which represents a problem as the resources of these systems have limitations in memory storage, bandwidth, and computational power. Field Programmable Gate Arrays (FPGAs) are programmable logic devices that offer high-speed parallel processing. FPGAs are particularly well-suited for applications that require real-time processing, high bandwidth, and low latency. One of the fundamental advantages of FPGAs is their flexibility in designing hardware tailored to specific needs, making them adaptable to a wide range of applications. They can be programmed to pre-process data close to sensors, which reduces the amount of data that needs to be transferred to other computing resources, improving overall system efficiency. Additionally, the reprogrammability of FPGAs enables them to be repurposed for different applications, providing a cost-effective solution that needs to adapt quickly to changing demands. FPGAs' performance per watt is close to that of Application-Specific Integrated Circuits (ASICs), with the added advantage of being reprogrammable. Despite all the advantages of FPGAs (e.g., energy efficiency, computing capabilities), the robotics community has not fully included them so far as part of their systems for several reasons. First, designing FPGA-based solutions requires hardware knowledge and longer development times as their programmability is more challenging than Central Processing Units (CPUs) or Graphics Processing Units (GPUs). Second, porting a robotics application (or parts of it) from software to an accelerator requires adequate interfaces between software and FPGAs. Third, the robotics workflow is already complex on its own, combining several fields such as mechanics, electronics, and software. There have been partial contributions in the state-of-the-art for FPGAs as part of robotics systems. However, a study of FPGAs as a whole for robotics systems is missing in the literature, which is the primary goal of this dissertation. Three main objectives have been established to accomplish this. (1) Define all components required for an FPGAs-based system for robotics applications as a whole. (2) Establish how all the defined components are related. (3) With the help of Model-Driven Engineering (MDE) techniques, generate these components, deploy them, and integrate them into existing solutions. The component-oriented approach proposed in this dissertation provides a proper solution for designing and implementing FPGA-based designs for robotics applications. The modular architecture, the tool 'FPGA Interfaces for Robotics Middlewares' (FIRM), and the toolchain 'FPGA Architectures for Robotics' (FAR) provide a set of tools and a comprehensive design process that enables the development of complex FPGA-based designs more straightforwardly and efficiently. The component-oriented approach contributed to the state-of-the-art in FPGA-based designs significantly for robotics applications and helps to promote their wider adoption and use by specialists with little FPGA knowledge

    Compiler-centric across-stack deep learning acceleration

    Get PDF
    Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a holistic approach, including a range of topics from machine learning, systems, and hardware. The rapid proliferation of deep learning applications has raised demand for efficient exploration and acceleration of deep learning based solutions. However, managing the range of optimization techniques, as well as how they interact with each other across the stack is a non-trivial task. A family of emerging specialized compilers for deep learning, tensor compilers, appear to be a strong candidate to help manage the complexity of across-stack optimization choices, and enable new approaches. This thesis presents new techniques and explorations of the Deep Learning Acceleration Stack (DLAS), with the perspective that the tensor compiler will increasingly be the center of this stack. First, we motivate the challenges in exploring DLAS, by describing the experience of running a perturbation study varying parameters at every layer of the stack. The core of the study is implemented using a tensor compiler, which reduces the complexity of evaluating the wide range of variants, although still requires a significant engineering effort to realize. Next, we develop a new algorithm for grouped convolution, a model optimization technique for which existing solutions provided poor inference time scaling. We implement and optimize our algorithm using a tensor compiler, outperforming existing approaches by 5.1Ă— on average (arithmetic mean). Finally, we propose a technique, transfer-tuning, to reduce the search time required for automatic tensor compiler code optimization, reducing the search time required by 6.5Ă— on average. The techniques and contributions of this thesis across these interconnected domains demonstrate the exciting potential of tensor compilers to simplify and improve design space exploration for DNNs, and their deployment. The outcomes of this thesis enable new lines of research to enable machine learning developers to keep up with the rapidly evolving landscape of neural architectures and hardware

    Consistent View-Based Management of Variability in Space and Time

    Get PDF
    Developing variable systems faces many challenges. Dependencies between interrelated artifacts within a product variant, such as code or diagrams, across product variants and across their revisions quickly lead to inconsistencies during evolution. This work provides a unification of common concepts and operations for variability management, identifies variability-related inconsistencies and presents an approach for view-based consistency preservation of variable systems

    Functional-safety analysis of ASIL decomposition for redundant automotive systems

    Get PDF

    Certifications of Critical Systems – The CECRIS Experience

    Get PDF
    In recent years, a considerable amount of effort has been devoted, both in industry and academia, to the development, validation and verification of critical systems, i.e. those systems whose malfunctions or failures reach a critical level both in terms of risks to human life as well as having a large economic impact.Certifications of Critical Systems – The CECRIS Experience documents the main insights on Cost Effective Verification and Validation processes that were gained during work in the European Research Project CECRIS (acronym for Certification of Critical Systems). The objective of the research was to tackle the challenges of certification by focusing on those aspects that turn out to be more difficult/important for current and future critical systems industry: the effective use of methodologies, processes and tools.The CECRIS project took a step forward in the growing field of development, verification and validation and certification of critical systems. It focused on the more difficult/important aspects of critical system development, verification and validation and certification process. Starting from both the scientific and industrial state of the art methodologies for system development and the impact of their usage on the verification and validation and certification of critical systems, the project aimed at developing strategies and techniques supported by automatic or semi-automatic tools and methods for these activities, setting guidelines to support engineers during the planning of the verification and validation phases

    Automated CNN pipeline generation for heterogeneous architectures

    Get PDF
    Heterogeneity is a vital feature in emerging processor chip designing. Asymmetric multicore-clusters such as high-performance cluster and power efficient cluster are common in modern edge devices. One example is Intel\u27s Alder Lake featuring Golden Cove high-performance cores and Gracemont power-efficient cores. Chiplet-based technology allows organization of multi cores in form of multi-chip-modules, thus housing large number of cores in a processor. Interposer based packaging has enabled embedding High Bandwidth Memory (HBM) on chip and reduced transmission latency and energy consumption of chiplet-chiplet interconnect.\ua0For Instance Intel\u27s XeHPC Ponte Vecchio package integrates multi-chip GPU organization along with HBM modules.Since new devices feature heterogeneity at the level of cores, memory and on-chip interconnect, it has become important to steer optimization at application level in order to leverage the new heterogeneous, high-performing and power-efficient features of underlying computing platforms. An important high-performance application paradigm is Convolution Neural Networks (CNN). CNNs are widely used in many practical applications. The pipelined parallel implementation of CNN is favored for inference on edge devices. In this Licentiate thesis we present a novel scheme for automatic scheduling of CNN pipelines on heterogeneous devices. A pipeline schedule is a configuration that provides information on depth of pipeline, grouping of CNN layers into pipeline stages and mapping of pipeline stages onto computing units. We utilize simple compile-time hints which consists of workload information of individual CNN layers and performance hints of computing units.The proposed approach provides near optimal solution for a throughput maximizing pipeline. We model the problem as a design space exploration technique. We developed a time-efficient design space navigation through heuristics extracted from the knowledge of CNN structure and underlying computing platform. The proposed search scheme converges faster and utilizes real-time performance measurements as fitness values. The results demonstrate that the proposed scheme converges faster and can scale when used with larger networks and computing platforms. Since the scheme utilizes online performance measurements, one of the challenges is to avoid expensive configurations during online tuning. The results demonstrate that on average, ~80\% of the tested configurations are sub-optimal solutions.Another challenge is to reduce convergence time. The experiments show that proposed approach is 35x faster than stochastic optimization algorithms. Since the design space is large and complex, We show that the proposed scheme explores only ~0.1% of the total design space in case of large CNNs (having 50+ layers) and results in near-optimal solution

    Consistency Preservation in the Development Process of Automotive Software

    Get PDF

    Lifecycle Management of Automotive Safety-Critical Over the Air Updates: A Systems Approach

    Get PDF
    With the increasing importance of Over The Air (OTA) updates in the automotive field, maintaining safety standards becomes more challenging as frequent incremental changes of embedded software are regularly integrated into a wide range of vehicle variants. This necessitates new processes and methodologies with a holistic view on the backend, where the updates are developed and released
    • …
    corecore