99 research outputs found

    Dynamically reconfigurable management of energy, performance, and accuracy applied to digital signal, image, and video Processing Applications

    Get PDF
    There is strong interest in the development of dynamically reconfigurable systems that can meet real-time constraints in energy/power-performance-accuracy (EPA/PPA). In this dissertation, I introduce a framework for implementing dynamically reconfigurable digital signal, image, and video processing systems. The basic idea is to first generate a collection of Pareto-optimal realizations in the EPA/PPA space. Dynamic EPA/PPA management is then achieved by selecting the Pareto-optimal implementations that can meet the real-time constraints. The systems are then demonstrated using Dynamic Partial Reconfiguration (DPR) and dynamic frequency control on FPGAs. The framework is demonstrated on: i) a dynamic pixel processor, ii) a dynamically reconfigurable 1-D digital filtering architecture, and iii) a dynamically reconfigurable 2-D separable digital filtering system. Efficient implementations of the pixel processor are based on the use of look-up tables and local-multiplexes to minimize FPGA resources. For the pixel-processor, different realizations are generated based on the number of input bits, the number of cores, the number of output bits, and the frequency of operation. For each parameters combination, there is a different pixel-processor realization. Pareto-optimal realizations are selected based on measurements of energy per frame, PSNR accuracy, and performance in terms of frames per second. Dynamic EPA/PPA management is demonstrated for a sequential list of real-time constraints by selecting optimal realizations and implementing using DPR and dynamic frequency control. Efficient FPGA implementations for the 1-D and 2-D FIR filters are based on the use a distributed arithmetic technique. Different realizations are generated by varying the number of coefficients, coefficient bitwidth, and output bitwidth. Pareto-optimal realizations are selected in the EPA space. Dynamic EPA management is demonstrated on the application of real-time EPA constraints on a digital video. The results suggest that the general framework can be applied to a variety of digital signal, image, and video processing systems. It is based on the use of offline-processing that is used to determine the Pareto-optimal realizations. Real-time constraints are met by selecting Pareto-optimal realizations pre-loaded in memory that are then implemented efficiently using DPR and/or dynamic frequency control

    A methodology for the design of dynamic accuracy operators by runtime back bias

    Get PDF
    Mobile and IoT applications must balance increasing processing demands with limited power and cost budgets. Approximate computing achieves this goal leveraging the error tolerance features common in many emerging applications to reduce power consumption. In particular, adequate (i.e., energy/quality-configurable) hardware operators are key components in an error tolerant system. Existing implementations of these operators require significant architectural modifications, hence they are often design-specific and tend to have large overheads compared to accurate units. In this paper, we propose a methodology to design adequate data-path operators in an automatic way, which uses threshold voltage scaling as a knob to dynamically control the power/accuracy tradeoff. The method overcomes the limitations of previous solutions based on supply voltage scaling, in that it introduces lower overheads and it allows fine-grain regulation of this tradeoff. We demonstrate our approach on a state-of-the-art 28nm FDSOI technology, exploiting the strong effect of back biasing on threshold voltage. Results show a power consumption reduction of as much as 39% compared to solutions based only on supply voltage scaling, at iso-accuracy

    Exploiting partial reconfiguration through PCIe for a microphone array network emulator

    Get PDF
    The current Microelectromechanical Systems (MEMS) technology enables the deployment of relatively low-cost wireless sensor networks composed of MEMS microphone arrays for accurate sound source localization. However, the evaluation and the selection of the most accurate and power-efficient network’s topology are not trivial when considering dynamic MEMS microphone arrays. Although software simulators are usually considered, they consist of high-computational intensive tasks, which require hours to days to be completed. In this paper, we present an FPGA-based platform to emulate a network of microphone arrays. Our platform provides a controlled simulated acoustic environment, able to evaluate the impact of different network configurations such as the number of microphones per array, the network’s topology, or the used detection method. Data fusion techniques, combining the data collected by each node, are used in this platform. The platform is designed to exploit the FPGA’s partial reconfiguration feature to increase the flexibility of the network emulator as well as to increase performance thanks to the use of the PCI-express high-bandwidth interface. On the one hand, the network emulator presents a higher flexibility by partially reconfiguring the nodes’ architecture in runtime. On the other hand, a set of strategies and heuristics to properly use partial reconfiguration allows the acceleration of the emulation by exploiting the execution parallelism. Several experiments are presented to demonstrate some of the capabilities of our platform and the benefits of using partial reconfiguration

    Designing Flexible, Energy Efficient and Secure Wireless Solutions for the Internet of Things

    Full text link
    The Internet of Things (IoT) is an emerging concept where ubiquitous physical objects (things) consisting of sensor, transceiver, processing hardware and software are interconnected via the Internet. The information collected by individual IoT nodes is shared among other often heterogeneous devices and over the Internet. This dissertation presents flexible, energy efficient and secure wireless solutions in the IoT application domain. System design and architecture designs are discussed envisioning a near-future world where wireless communication among heterogeneous IoT devices are seamlessly enabled. Firstly, an energy-autonomous wireless communication system for ultra-small, ultra-low power IoT platforms is presented. To achieve orders of magnitude energy efficiency improvement, a comprehensive system-level framework that jointly optimizes various system parameters is developed. A new synchronization protocol and modulation schemes are specified for energy-scarce ultra-small IoT nodes. The dynamic link adaptation is proposed to guarantee the ultra-small node to always operate in the most energy efficiency mode, given an operating scenario. The outcome is a truly energy-optimized wireless communication system to enable various new applications such as implanted smart-dust devices. Secondly, a configurable Software Defined Radio (SDR) baseband processor is designed and shown to be an efficient platform on which to execute several IoT wireless standards. It is a custom SIMD execution model coupled with a scalar unit and several architectural optimizations: streaming registers, variable bitwidth, dedicated ALUs, and an optimized reduction network. Voltage scaling and clock gating are employed to further reduce the power, with a more than a 100% time margin reserved for reliable operation in the near-threshold region. Two upper bound systems are evaluated. A comprehensive power/area estimation indicates that the overhead of realizing SDR flexibility is insignificant. The benefit of baseband SDR is quantified and evaluated. To further augment the benefits of a flexible baseband solution and to address the security issue of IoT connectivity, a light-weight Galois Field (GF) processor is proposed. This processor enables both energy-efficient block coding and symmetric/asymmetric cryptography kernel processing for a wide range of GF sizes (2^m, m = 2, 3, ..., 233) and arbitrary irreducible polynomials. Program directed connections among primitive GF arithmetic units enable dynamically configured parallelism to efficiently perform either four-way SIMD GF operations, including multiplicative inverse, or a long bit-width GF product in a single cycle. This demonstrates the feasibility of a unified architecture to enable error correction coding flexibility and secure wireless communication in the low power IoT domain.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137164/1/yajchen_1.pd

    Wordlength optimization for linear digital signal processing

    No full text
    Published versio

    Multiple Real-Constant Multiplication for Computationally Efficient Implementation of Digital Transforms

    Get PDF
    The need to multiply signals by vectors (or matrices) of constants is fundamental and frequently arises in many areas of electrical and computer engineering.In their hardware implementations, performance issues such as circuit area, delay, and power consumption heavily influence the design process.It is well known that multiplication of a signal by a constant can be implemented multiplierless as a network of shifts and additions, and that these computational networks, termed shift-add networks, can lend to higher performing circuit implementations than when using general multipliers.There is a rich body of work on the optimization of shift-add networks, known as the multiple constant multiplication (MCM) problem.However, the optimization strategies that have been developed for the MCM traditionally assume that the vector multiplications being optimized always stem from integer constants.This assumption breaks down for many real-world applications, where the target constants for MCM optimization are real numbers rather than integers.In these situations there is flexibility in how constants are quantized in digital circuits that can be leveraged.Thus, it is desirable to have a method of jointly optimizing both the constant quantization error and the shift-add network simultaneously.This dissertation addresses this need by providing a problem framework and algorithms for joint quantization/MCM optimization and, through a series of experiments, shows that there is a potential for tremendous benefit when the optimization of quantization and shift-add networks is executed in one unified problem framework.After reviewing the relevant work, this dissertation rigorously develops the aforementioned joint optimization framework, describing the metrics used for quantization error, and culminates in a formal problem statement.We call this joint optimization problem the multiple real-constant multiplication problem (MRCM) in order to distinguish it from the traditional MCM problem that operates exclusively with integer constants, which we hereafter refer to as the multiple integer-constant multiplication (MICM) problem.Then, we consider three different cost models used for evaluating shift-add networks and, with each model, we determine the potential advantages of using our MRCM framework over the traditional MICM approach.First, we consider the traditional adder-count cost model.We start by formally defining the MRCM problem in the context of this cost model, and then describe a series of theoretical developments centered around finitizing and pruning the search space, leading to an efficient algorithm for solving the problem.Next, via extensive randomized experiments, we show that our joint framework leads to a reduction on the number of adders by 15%–60% on moderate size problems.In particular, for vectors of arbitrary constants, we show a possibility for 20%–60% reduction with less than 10% vector approximation error for both frameworks, whereas for vectors of low-pass filter coefficients, a 15%–30% reduction is possible without exceeding 10% error in frequency response.Second, we consider an adder-bits cost model, whereby instead of simply counting the number of adders, we compute the combined bitwidth of all the adders.To solve the MRCM problem in this context, we introduce two search algorithms—one greedy and one optimal, each guided by a novel MRCM-aware heuristic.Next, we discuss a randomized experiment, in which we compare both algorithms to an MICM-targeted heuristic.We observe that the greedy search finds solutions with an average cost improvement of 13% over the MICM solution with the trials considered, and the optimal search finds an additional improvement of 6%.Third, we consider a prominent gate-level cost model from the literature that.This gate-level model consider the bitwidths of an adder's inputs and output along with the relative alignment of the inputs/output due to bit shifting, when computing the adder cost.To solve the MRCM problem in this context, a novel greedy algorithm is developed that uses a functional programming approach to solving the MRCM problem.Next, we experimentally show this algorithm to offer an improvement of up to 18%, over a competing MICM algorithm, on small instances having 20 8-bit constants, increasing to up to 59% improvement on larger instances having 80 5-bit constants.Finally, we conclude the work by offering recommendations for possible future work in the development of efficient MRCM algorithms and novel problem formulations for optimizing MCM circuits

    An Automated Design Flow for Approximate Circuits based on Reduced Precision Redundancy

    Get PDF
    Reduced Precision Redundancy (RPR) is a popular Approximate Computing technique, in which a circuit operated in Voltage Over-Scaling (VOS) is paired to a reduced-bitwidth and faster replica so that VOS-induced timing errors are partially recovered by the replica, and their impact is mitigated. Previous works have provided various examples of effective implementations of RPR, which however suffer from three limitations: first, these circuits are designed using ad-hoc procedures, and no generalization is provided; second, error impact analysis is carried out statistically, thus neglecting issues like non-elementary data distribution and temporal correlation. Last, only dynamic power was considered in the optimization. In this work we propose a new generalized approach to RPR that allows to overcome all these limitations, leveraging the capabilities of state-of-the-art synthesis and simulation tools. By sacrificing theoretical provability in favor of an empirical input-based analysis, we build a design tool able to automatically add RPR to a preexisting gate-level netlist. Thanks to this method, we are able to confute some of the conclusions drawn in previous works, in particular those related to statistical assumptions on inputs; we show that a given inputs distribution may yield extremely different results depending on their temporal behavior

    Approximate Computing for Energy Efficiency

    Get PDF
    • …
    corecore