1,376 research outputs found
Hardware-software co-design for low-cost AI processing in space processors
In the recent years there has been an increasing interest in artificial intelligence (AI) and machine learning (ML). The advantages of such applications are widespread across many areas and have drawn the attention of different sectors, such as aerospace. However, these applications require much more performance than the one provided by space processors. In space the environment is not ideal for high-performance cutting-edge processors, due to radiation. For this reason, radiation hardened or radiation tolerant processors are required, which use older technologies and redundant logic, reducing the available die resources that can be exploited. In order to accelerate demanding AI applications in space processors, this thesis presents SPARROW, a low-cost SIMD accelerator for AI operations. SPARROW has been designed following a hardware-software co-design approach by analyzing the requirements of common AI applications in order to improve the efficiency of the module. The design of such module does not use any existing vector extension and instead has in its portability one of the key advantages over other implementations. Furthermore, SPARROW reuses the integer register file of the processor avoiding complex managing of the data while reducing significantly the hardware cost of the module, which is specially interesting in the space domain due to the constraints in the processor area. SPARROW operates with 8-bit integer vector components in two different stages, performing parallel computations in the first and reduction operations in the second. This design is integrated within the baseline processor not requiring any additional pipeline stage nor a modification of the processor frequency. SPARROW also includes swizzling and masking capabilities for the input vectors as well as saturation to work with 8 bits without overflow. SPARROW has been integrated with the LEON3 and NOEL-V space-grade processors, both distributed by Cobham Gaisler. Since each of the baseline processors has a different architecture set, software support for SPARROW has been provided for both SPARC v8 and RISC-V ISAs, showing the portability of the design. Software support been developed using two well established compilers, LLVM and GCC allowing for a comparison of the cost of developing support for each of them. The modifications have included the SPARROW instructions in the assembly language of each architecture and with the use of inline assembly and macros allow a programming model similar to SIMD intrinsics. LEON3 and NOEL-V extended with SPARROW have been implemented on a FPGA to evaluate the performance increase provided by our proposal. In order to compare the performance with the scalar version of the processor, different AI related applications have been tested such as matrix multiplication and image filters, which are essential building blocks for convolutional neural networks. With the use of SPARROW a speed-ups of 6x and up to 15x have been achieved
Verification of Magnitude and Phase Responses in Fixed-Point Digital Filters
In the digital signal processing (DSP) area, one of the most important tasks
is digital filter design. Currently, this procedure is performed with the aid
of computational tools, which generally assume filter coefficients represented
with floating-point arithmetic. Nonetheless, during the implementation phase,
which is often done in digital signal processors or field programmable gate
arrays, the representation of the obtained coefficients can be carried out
through integer or fixed-point arithmetic, which often results in unexpected
behavior or even unstable filters. The present work addresses this issue and
proposes a verification methodology based on the digital-system verifier
(DSVerifier), with the goal of checking fixed-point digital filters w.r.t.
implementation aspects. In particular, DSVerifier checks whether the number of
bits used in coefficient representation will result in a filter with the same
features specified during the design phase. Experimental results show that
errors regarding frequency response and overflow are likely to be identified
with the proposed methodology, which thus improves overall system's
reliability
SMT-Based Bounded Model Checking of Fixed-Point Digital Controllers
Digital controllers have several advantages with respect to their flexibility
and design's simplicity. However, they are subject to problems that are not
faced by analog controllers. In particular, these problems are related to the
finite word-length implementation that might lead to overflows, limit cycles,
and time constraints in fixed-point processors. This paper proposes a new
method to detect design's errors in digital controllers using a state-of-the
art bounded model checker based on satisfiability modulo theories. The
experiments with digital controllers for a ball and beam plant demonstrate that
the proposed method can be very effective in finding errors in digital
controllers than other existing approaches based on traditional simulations
tools
Fast Linear Programming through Transprecision Computing on Small and Sparse Data
A plethora of program analysis and optimization techniques rely on linear programming at their heart. However, such techniques are often considered too slow for production use. While today’s best solvers are optimized for complex problems with thousands of dimensions, linear programming, as used in compilers, is typically applied to small and seemingly trivial problems, but to many instances in a single compilation run. As a result, compilers do not benefit from decades of research on optimizing large-scale linear programming. We design a simplex solver targeted at compilers. A novel theory of transprecision computation applied from individual elements to full data-structures provides the computational foundation. By carefully combining it with optimized representations for small and sparse matrices and specialized small-coefficient algorithms, we (1) reduce memory traffic, (2) exploit wide vectors, and (3) use low-precision arithmetic units effectively. We evaluate our work by embedding our solver into a state-of-the-art integer set library and implement one essential operation, coalescing, on top of our transprecision solver. Our evaluation shows more than an order-of-magnitude speedup on the core simplex pivot operation and a mean speedup of 3.2x (vs. GMP) and 4.6x (vs. IMath) for the optimized coalescing operation. Our results demonstrate that our optimizations exploit the wide SIMD instructions of modern microarchitectures effectively. We expect our work to provide foundations for a future integer set library that uses transprecision arithmetic to accelerate compiler analyses.ISSN:2475-142
- …