231 research outputs found

    Instruction-set architecture synthesis for VLIW processors

    Get PDF

    Further Specialization of Clustered VLIW Processors: A MAP Decoder for Software Defined Radio

    Get PDF
    Turbo codes are extensively used in current communications standards and have a promising outlook for future generations. The advantages of software defined radio, especially dynamic reconfiguration, make it very attractive in this multi-standard scenario. However, the complex and power consuming implementation of the maximum a posteriori (MAP) algorithm, employed by turbo decoders, sets hurdles to this goal. This work introduces an ASIP architecture for the MAP algorithm, based on a dual-clustered VLIW processor. It displays the good performance of application specific designs along with the versatility of processors, which makes it compliant with leading edge standards. The machine deals with multi-operand instructions in an innovative way, the fetching and assertion of data is serialized and the addressing is automatized and transparent for the programmer. The performance-area trade-off of the proposed architecture achieves a throughput of 8 cycles per symbol with very low power dissipation

    A Micro Power Hardware Fabric for Embedded Computing

    Get PDF
    Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

    Computer aided design of cluster-based ASIPs

    Get PDF

    On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors

    Get PDF
    Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads

    10281 Abstracts Collection -- Dynamically Reconfigurable Architectures

    Get PDF
    From 11.07.10 to 16.07.10, Dagstuhl Seminar 10281 ``Dynamically Reconfigurable Architectures \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Sovelluskohtainen käskykantaprosessori tulevaisuuden radiomikropiireihin

    Get PDF
    Licensed Assisted Access is a 3GPP specified feature, for using the unlicensed frequen-cy band as a supplemental transmission medium to the licensed band. LAA uses clear channel assessment, for discovering the channel state and accessing the medium. LAA provides a contention based algorithm, featuring a conservative listen-before-talk scheme, and random back-off. This CCA scheme is thought to increase co-existence with existing technologies in the unlicensed band, namely, WLAN and Bluetooth. Application-specific instruction-set processors can be tailored to fit most applications, and offer increased flexibility to hardware design through, programmable solutions. ASIP architecture is defined by the designer, while the ASIP tools provide retargetable compiler generation and automatic hardware description generation, for faster design exploration. In this thesis, we explore the 3GPP LAA downlink requirements, and identify the key processing challenges as FFT, energy detection and carrier state maintenance. To design an efficient ASIP for LAA, we explore the different architectural choices we have available and arrive at a statically scheduled, multi-issue architecture. We evaluate dif-ferent design approaches, and choose a Nokia internal ASIP design as the basis for our solution. We modify the design, to meet our requirements and conclude that the pro-posed solution should fit the LAA use case well
    corecore