318 research outputs found

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF

    Serial-data computation in VLSI

    Get PDF

    Dynamically reconfigurable asynchronous processor

    Get PDF
    The main design requirements for today's mobile applications are: · high throughput performance. · high energy efficiency. · high programmability. Until now, the choice of platform has often been limited to Application-Specific Integrated Circuits (ASICs), due to their best-of-breed performance and power consumption. The economies of scale possible with these high-volume markets have traditionally been able to hide the high Non-Recurring Engineering (NRE) costs required for designing and fabricating new ASICs. However, with the NREs and design time escalating with each generation of mobile applications, this practice may be reaching its limit. Designers today are looking at programmable solutions, so that they can respond more rapidly to changes in the market and spread costs over several generations of mobile applications. However, there have been few feasible alternatives to ASICs: Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much area and power. Coarse-grained dynamically reconfigurable architectures offer better solutions for high throughput applications, when power and area considerations are taken into account. One promising example is the Reconfigurable Instruction Cell Array (RICA). RICA consists of an array of cells with an interconnect that can be dynamically reconfigured on every cycle. This allows quite complex datapaths to be rendered onto the fabric and executed in a single configuration - making these architectures particularly suitable to stream processing. Furthermore, RICA can be programmed from C, making it a good fit with existing design methodologies. However the RICA architecture has a drawback: poor scalability in terms of area and power. As the core gets bigger, the number of sequential elements in the array must be increased significantly to maintain the ability to achieve high throughputs through pipelining. As a result, a larger clock tree is required to synchronise the increased number of sequential elements. The clock tree therefore takes up a larger percentage of the area and power consumption of the core. This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor (DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA architecture, but uses asynchronous design techniques - methods of designing digital systems without clocks. The absence of a global clock signal makes DRAP more scalable in terms of power and area overhead than its synchronous counterpart. The DRAP architecture maintains most of the benefits of custom asynchronous design, whilst also providing programmability via conventional high-level languages. Results show that the DRAP processor delivers considerably lower power consumption when compared to a market-leading Very Long Instruction Word (VLIW) processor and a low-power ARM processor. For example, DRAP resulted in a reduction in power consumption of 20 times compared to the ARM7 processor, and 29 times compared to the TIC64x VLIW, when running the same benchmark capped to the same throughput and for the same process technology (0.13μm). When compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but resulted in a power reduction of up to 1.9 times. It was also capable of achieving up to 2.8 times higher throughputs than RICA for the same benchmarks

    Toatie : functional hardware description with dependent types

    Get PDF
    Describing correct circuits remains a tall order, despite four decades of evolution in Hardware Description Languages (HDLs). Many enticing circuit architectures require recursive structures or complex compile-time computation — two patterns that prove difficult to capture in traditional HDLs. In a signal processing context, the Fast FIR Algorithm (FFA) structure for efficient parallel filtering proves to be naturally recursive, and most Multiple Constant Multiplication (MCM) blocks decompose multiplications into graphs of simple shifts and adds using demanding compile time computation. Generalised versions of both remain mostly in academic folklore. The implementations which do exist are often ad hoc circuit generators, written in software languages. These pose challenges for verification and are resistant to composition. Embedded functional HDLs, that represent circuits as data, allow for these descriptions at the cost of forcing the designer to work at the gate-level. A promising alternative is to use a stand-alone compiler, representing circuits as plain functions, exemplified by the CλaSH HDL. This, however, raises new challenges in capturing a circuit’s staging — which expressions in the single language should be reduced during compile-time elaboration, and which should remain in the circuit’s run-time? To better reflect the physical separation between circuit phases, this work proposes a new functional HDL (representing circuits as functions) with first-class staging constructs. Orthogonal to this, there are also long-standing challenges in the verification of parameterised circuit families. Industry surveys have consistently reported that only a slim minority of FPGA projects reach production without non-trivial bugs. While a healthy growth in the adoption of automatic formal methods is also reported, the majority of testing remains dynamic — presenting difficulties for testing entire circuit families at once. This research offers an alternative verification methodology via the combination of dependent types and automatic synthesis of user-defined data types. Given precise enough types for synthesisable data, this environment can be used to develop circuit families with full functional verification in a correct-by-construction fashion. This approach allows for verification of entire circuit families (not just one concrete member) and side-steps the state-space explosion of model checking methods. Beyond the existing work, this research offers synthesis of combinatorial circuits — not just a software model of their behaviour. This additional step requires careful consideration of staging, erasure & irrelevance, deriving bit representations of user-defined data types, and a new synthesis scheme. This thesis contributes steps towards HDLs with sufficient expressivity for awkward, combinatorial signal processing structures, allowing for a correct-by-construction approach, and a prototype compiler for netlist synthesis.Describing correct circuits remains a tall order, despite four decades of evolution in Hardware Description Languages (HDLs). Many enticing circuit architectures require recursive structures or complex compile-time computation — two patterns that prove difficult to capture in traditional HDLs. In a signal processing context, the Fast FIR Algorithm (FFA) structure for efficient parallel filtering proves to be naturally recursive, and most Multiple Constant Multiplication (MCM) blocks decompose multiplications into graphs of simple shifts and adds using demanding compile time computation. Generalised versions of both remain mostly in academic folklore. The implementations which do exist are often ad hoc circuit generators, written in software languages. These pose challenges for verification and are resistant to composition. Embedded functional HDLs, that represent circuits as data, allow for these descriptions at the cost of forcing the designer to work at the gate-level. A promising alternative is to use a stand-alone compiler, representing circuits as plain functions, exemplified by the CλaSH HDL. This, however, raises new challenges in capturing a circuit’s staging — which expressions in the single language should be reduced during compile-time elaboration, and which should remain in the circuit’s run-time? To better reflect the physical separation between circuit phases, this work proposes a new functional HDL (representing circuits as functions) with first-class staging constructs. Orthogonal to this, there are also long-standing challenges in the verification of parameterised circuit families. Industry surveys have consistently reported that only a slim minority of FPGA projects reach production without non-trivial bugs. While a healthy growth in the adoption of automatic formal methods is also reported, the majority of testing remains dynamic — presenting difficulties for testing entire circuit families at once. This research offers an alternative verification methodology via the combination of dependent types and automatic synthesis of user-defined data types. Given precise enough types for synthesisable data, this environment can be used to develop circuit families with full functional verification in a correct-by-construction fashion. This approach allows for verification of entire circuit families (not just one concrete member) and side-steps the state-space explosion of model checking methods. Beyond the existing work, this research offers synthesis of combinatorial circuits — not just a software model of their behaviour. This additional step requires careful consideration of staging, erasure & irrelevance, deriving bit representations of user-defined data types, and a new synthesis scheme. This thesis contributes steps towards HDLs with sufficient expressivity for awkward, combinatorial signal processing structures, allowing for a correct-by-construction approach, and a prototype compiler for netlist synthesis

    Application of multimode HLS techniques to ASIP synthesis

    No full text
    Automatic generation of ASIPs is still insufficiently resource-efficient compared to human design. The current best eort relies on extending existing processor cores with custom instruction pathways, an approach that has several drawbacks: resources already existing in the core cannot be reused, and data access is restricted to the original register reads and writes, creating a bottleneck in throughput. We present an ASIP synthesis algorithm that turns a semantic description of an instruction set into a synthesizable description of a processor's datapath in an efficient manner, based on a mod- ication of Moreano's datapath merge algorithm. Support for operator mobility across pipeline stages, combinatorial loop prevention and realistic multiplexer resource usage estimation for FPGAs is added. The translation chain is not yet completed, but preliminary results show efficient resource reuse between instruction datapaths

    Accurate depth from defocus estimation with video-rate implementation

    Get PDF
    The science of measuring depth from images at video rate using „defocus‟ has been investigated. The method required two differently focussed images acquired from a single view point using a single camera. The relative blur between the images was used to determine the in-focus axial points of each pixel and hence depth. The depth estimation algorithm researched by Watanabe and Nayar was employed to recover the depth estimates, but the broadband filters, referred as the Rational filters were designed using a new procedure: the Two Step Polynomial Approach. The filters designed by the new model were largely insensitive to object texture and were shown to model the blur more precisely than the previous method. Experiments with real planar images demonstrated a maximum RMS depth error of 1.18% for the proposed filters, compared to 1.54% for the previous design. The researched software program required five 2D convolutions to be processed in parallel and these convolutions were effectively implemented on a FPGA using a two channel, five stage pipelined architecture, however the precision of the filter coefficients and the variables had to be limited within the processor. The number of multipliers required for each convolution was reduced from 49 to 10 (79.5% reduction) using a Triangular design procedure. Experimental results suggested that the pipelined processor provided depth estimates comparable in accuracy to the full precision Matlab‟s output, and generated depth maps of size 400 x 400 pixels in 13.06msec, that is faster than the video rate. The defocused images (near and far-focused) were optically registered for magnification using Telecentric optics. A frequency domain approach based on phase correlation was employed to measure the radial shifts due to magnification and also to optimally position the external aperture. The telecentric optics ensured pixel to pixel registration between the defocused images was correct and provided more accurate depth estimates

    Reconfigurable architectures for beyond 3G wireless communication systems

    Get PDF

    Spectrum Optimisation in Wireless Communication Systems: Technology Evaluation, System Design and Practical Implementation

    Get PDF
    Two key technology enablers for next generation networks are examined in this thesis, namely Cognitive Radio (CR) and Spectrally Efficient Frequency Division Multiplexing (SEFDM). The first part proposes the use of traffic prediction in CR systems to improve the Quality of Service (QoS) for CR users. A framework is presented which allows CR users to capture a frequency slot in an idle licensed channel occupied by primary users. This is achieved by using CR to sense and select target spectrum bands combined with traffic prediction to determine the optimum channel-sensing order. The latter part of this thesis considers the design, practical implementation and performance evaluation of SEFDM. The key challenge that arises in SEFDM is the self-created interference which complicates the design of receiver architectures. Previous work has focused on the development of sophisticated detection algorithms, however, these suffer from an impractical computational complexity. Consequently, the aim of this work is two-fold; first, to reduce the complexity of existing algorithms to make them better-suited for application in the real world; second, to develop hardware prototypes to assess the feasibility of employing SEFDM in practical systems. The impact of oversampling and fixed-point effects on the performance of SEFDM is initially determined, followed by the design and implementation of linear detection techniques using Field Programmable Gate Arrays (FPGAs). The performance of these FPGA based linear receivers is evaluated in terms of throughput, resource utilisation and Bit Error Rate (BER). Finally, variants of the Sphere Decoding (SD) algorithm are investigated to ameliorate the error performance of SEFDM systems with targeted reduction in complexity. The Fixed SD (FSD) algorithm is implemented on a Digital Signal Processor (DSP) to measure its computational complexity. Modified sorting and decomposition strategies are then applied to this FSD algorithm offering trade-offs between execution speed and BER
    • …
    corecore