1,912 research outputs found
Study of the dynamics and ionization of the upper atmosphere Final report
Wind effects on nighttime E region ionization layer distributio
Investigation of the time variation of sporadic-E layers Final report
Analysis of wind and electron density profiles of E-region obtained from Nike-Apache rockets launched 22 Feb. 196
Optimal combined word-length allocation and architectural synthesis of digital signal processing circuits
Published versio
Concurrency-aware thread scheduling for high-level synthesis
When mapping C programs to hardware, high-level synthesis (HLS) tools seek to reorder instructions so they can be packed into as few clock cycles as possible. However, when synthesising multi-threaded C, instruction reordering is inhibited by the presence of atomic operations (‘atomics’), such as compare- and-swap. Atomics, the fundamental concurrency primitive in C, are the basis of more abstract concurrency mechanisms such as locks, and also of efficient lock-free data structures. Whether a particular atomic can be legally reordered within a thread can depend on the memory access patterns of other threads. Existing HLS tools that support atomics typically sched- ule each thread independently, and so must be conservative when optimising around atomics. Yet HLS tools are distinguished from conventional compilers by having the entire program available. Can this information be exploited to allow more reorderings within each thread, and hence to obtain more efficient schedules? In this work, we propose a global analysis that determines, for each thread, which pairs of instructions must not be reordered. Our analysis is sensitive to the C consistency mode of the atomics involved (e.g. relaxed, release, acquire, and sequentially- consistent). We have used the Alloy model checker to validate our analysis against the C language standard, and have implemented it in the LegUp HLS tool. An evaluation on several lock-free data structure benchmarks indicates that our analysis leads to a 1.6 × average global speedup
Modeling round-off error in the fast gradient method for predictive control
We present a method for determining the smallest precision required to have algorithmic stability of an implementation of the Fast Gradient Method (FGM) when solving a linear Model Predictive Control (MPC) problem in fixed-point arithmetic. We derive two models for the round-off error present in fixed-point arithmetic. The first is a generic model with no assumptions on the predicted system or weight matrices. The second is a parametric model that exploits the Toeplitz structure of the MPC problem for a Schur-stable system. We also propose a metric for measuring the amount of round-off error the FGM iteration can tolerate before becoming unstable. This metric is combined with the round-off error models to compute the minimum number of fractional bits needed for the fixed-point data type. Using these models, we show that exploiting the MPC problem structure nearly halves the number of fractional bits needed to implement an example problem. We show that this results in significant decreases in resource usage, computational energy and execution time for an implementation on a Field Programmable Gate Array
Horizon-independent preconditioner design for linear predictive control
First-order optimization solvers, such as the Fast Gradient Method, are increasingly being used to solve Model Predictive Control problems in resource-constrained environments. Unfortunately, the convergence rate of these solvers is significantly affected by the conditioning of the problem data, with ill-conditioned problems requiring a large number of iterations. To reduce the number of iterations required, we present a simple method for computing a horizon-independent preconditioning matrix for the Hessian of the condensed problem. The preconditioner is based on the block Toeplitz structure of the Hessian. Horizon-independence allows one to use only the predicted system and cost matrices to compute the preconditioner, instead of the full Hessian. The proposed preconditioner has equivalent performance to an optimal preconditioner in numerical examples, producing speedups between 2x and 9x for the Fast Gradient Method. Additionally, we derive horizon-independent spectral bounds for the Hessian in terms of the transfer function of the predicted system, and show how these can be used to compute a novel horizon-independent bound on the condition number for the preconditioned Hessian
Approximate logic synthesis: a survey
Approximate computing is an emerging paradigm that, by relaxing the requirement for full accuracy, offers benefits in terms of design area and power consumption. This paradigm is particularly attractive in applications where the underlying computation has inherent resilience to small errors. Such applications are abundant in many domains, including machine learning, computer vision, and signal processing. In circuit design, a major challenge is the capability to synthesize the approximate circuits automatically without manually relying on the expertise of designers. In this work, we review methods devised to synthesize approximate circuits, given their exact functionality and an approximability threshold. We summarize strategies for evaluating the error that circuit simplification can induce on the output, which guides synthesis techniques in choosing the circuit transformations that lead to the largest benefit for a given amount of induced error. We then review circuit simplification methods that operate at the gate or Boolean level, including those that leverage classical Boolean synthesis techniques to realize the approximations. We also summarize strategies that take high-level descriptions, such as C or behavioral Verilog, and synthesize approximate circuits from these descriptions
System-level linking of synthesised hardware and compiled software using a higher-order type system
Devices with tightly coupled CPUs and FPGA logic allow for the implementation of heterogeneous applications which combine multiple components written in hardware and software languages, including first-party source code and third-party IP. Flexibility in component relationships is important, so that the system designer can move components between software and hardware as the application design evolves. This paper presents a system-level type system and linker, which allows functions in software and hardware components to be directly linked at link time, without requiring any modification or recompilation of the components. The type system is designed to be language agnostic, and exhibits higher-order features, to enables design patterns such as notifications and callbacks to software from within hardware functions. We demonstrate the system through a number of case studies which link compiled software against synthesised hardware in the Xilinx Zynq platform
- …