4,426 research outputs found
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has
received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking
received support from the European Unionâs Horizon 2020 research and innovation programme and Germany,
Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy,
Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL
Joint Undertaking under grant agreement No. 692455-2
SLIM: A Language for Microcode Description and Simulation in VLSI
SLIM (Stanford Language for Implementing Microcode) is a programming language based system for
specifying and simulating microcode in a VLSI chip. The language is oriented towards PLA
implementations of microcoded machines using either a microprogram counter or a finite state
machine. The system supports simulation of the microcode and will drive a PLA layout program to
automatically create the PLA
Using ACL2 to Verify Loop Pipelining in Behavioral Synthesis
Behavioral synthesis involves compiling an Electronic System-Level (ESL)
design into its Register-Transfer Level (RTL) implementation. Loop pipelining
is one of the most critical and complex transformations employed in behavioral
synthesis. Certifying the loop pipelining algorithm is challenging because
there is a huge semantic gap between the input sequential design and the output
pipelined implementation making it infeasible to verify their equivalence with
automated sequential equivalence checking techniques. We discuss our ongoing
effort using ACL2 to certify loop pipelining transformation. The completion of
the proof is work in progress. However, some of the insights developed so far
may already be of value to the ACL2 community. In particular, we discuss the
key invariant we formalized, which is very different from that used in most
pipeline proofs. We discuss the needs for this invariant, its formalization in
ACL2, and our envisioned proof using the invariant. We also discuss some
trade-offs, challenges, and insights developed in course of the project.Comment: In Proceedings ACL2 2014, arXiv:1406.123
High performance FPGA implementation of the mersenne twister
Efficient generation of random and pseudorandom sequences is of great importance to a number of applications [4]. In this paper, an efficient implementation of the Mersenne Twister is presented. The proposed architecture has the smallest footprint of all published architectures to date and occupies only 330 FPGA slices. Partial pipelining and sub-expression simplification has been used to improve throughput per clock cycle. The proposed architecture is implemented on an RC1000 FPGA Development platform equipped with a Xilinx XCV2000E FPGA, and can generate 20 million 32 bit random numbers per second at a clock rate of 24.234 MHz. A through performance analysis has been performed, and it is observed that the proposed architecture clearly outperforms other existing implementations in key comparable performance metrics
A C++-embedded Domain-Specific Language for programming the MORA soft processor array
MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance
On-Line Instruction-checking in Pipelined Microprocessors
Microprocessors performances have increased by more than five orders of magnitude in the last three decades. As technology scales down, these components become inherently unreliable posing major design and test challenges. This paper proposes an instruction-checking architecture to detect erroneous instruction executions caused by both permanent and transient errors in the internal logic of a microprocessor. Monitoring the correct activation sequence of a set of predefined microprocessor control/status signals allow distinguishing between correctly and not correctly executed instruction
Adaptive Latency Insensitive Protocols andElastic Circuits with Early Evaluation: A Comparative Analysis
AbstractLatency Insensitive Protocols (LIP) and Elastic Circuits (EC) solve the same problem of rendering a design tolerant to additional latencies caused by wires or computational elements. They are performance-limited by a firing semantics that enforces coherency through a lazy evaluation rule: Computation is enabled if all inputs to a block are simultaneously available. Adaptive LIP's (ALIP) and EC with early evaluation (ECEE) increase the performance by relaxing the evaluation rule: Computation is enabled as soon as the subset of inputs needed at a given time is available. Their difference in terms of implementation and behavior in selected cases justifies the need for the comparative analysis reported in this paper. Results have been obtained through simple examples, a single representative case-study already used in the context of both LIP's and EC and through extensive simulations over a suite of benchmarks
- âŠ