Search CORE

435 research outputs found

A Low Power 32 Bit CMOS ROM Using A Novel ATD Circuit

Author: Kukreti Siddhant
Singh Gurmohan
Sulochana Vemu
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2013
Field of study

A low power high speed 32 Bit ROM circuit implemented on 0.18µm CMOS process has been presented in this paper. The circuit is build using a parallel ROM core structure and runs on 1.8 V supply voltage. A novel Address Transition Decoder (ATD) circuit is proposed which energizes the ROM components such as Row Decoder, Column Decoder, ROM core etc, for short time intervals when there is a transition in input address bits. The power consumed in ROM with proposed ATD circuit is 0.78 mW, which corresponds to 82.27% reduction in power as compared to ROM without ATD circuit (4.46 mW). At the output almost full signal swing has been achieved without using any sense amplifier. The implemented ROM has a very low latency of 0.56 ns.DOI:http://dx.doi.org/10.11591/ijece.v3i4.316

IAES journal

Institute of Advanced Engineering and Science

A general framework for efficient FPGA implementation of matrix product

Author: Amira A.
Bensaali F.
Sotudeh R.
Publication venue
Publication date: 01/01/2007
Field of study

Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

University of Hertfordshire Research Archive

Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

Author: Casula M.
Fanucci L.
Martina Maurizio
Masera Guido
Saponara S.
Publication venue: Elsevier
Publication date: 01/01/2010
Field of study

Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

Archivio della Ricerca - Università di Pisa

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Configurable and Scalable Turbo Decoder for 4G Wireless Receivers

Author: Cavallaro Joseph R.
Manish Goel
Sun Yang
Zhu Yuming
Publication venue: IGI-Global Press
Publication date: 01/01/2010
Field of study

The increasing requirements of high data rates and quality of service (QoS) in fourth-generation (4G) wireless communication require the implementation of practical capacity approaching codes. In this chapter, the application of Turbo coding schemes that have recently been adopted in the IEEE 802.16e WiMax standard and 3GPP Long Term Evolution (LTE) standard are reviewed. In order to process several 4G wireless standards with a common hardware module, a reconfigurable and scalable Turbo decoder architecture is presented. A parallel Turbo decoding scheme with scalable parallelism tailored to the target throughput is applied to support high data rates in 4G applications. High-level decoding parallelism is achieved by employing contention-free interleavers. A multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. A new on-line address generation technique is introduced to support multiple Turbo interleaving patterns, which avoids the interleaver address memory that is typically necessary in the traditional designs. Design trade-offs in terms of area and power efficiency are analyzed for different parallelism and clock frequency goals

DSpace at Rice University

Multicarrier Faster-than-Nyquist Signaling Transceivers: From Theory to Practice

Author: Dasalukunte Deepak
Publication venue
Publication date: 01/01/2011
Field of study

The demand for spectrum resources in cellular systems worldwide has seen a tremendous escalation in the recent past. The mobile phones of today are capable of being cameras taking pictures and videos, able to browse the Internet, do video calling and much more than an yesteryear computer. Due to the variety and the amount of information that is being transmitted the demand for spectrum resources is continuously increasing. Efficient use of bandwidth resources has hence become a key parameter in the design and realization of wireless communication systems. Faster-than-Nyquist (FTN) signaling is one such technique that achieves bandwidth efficiency by making better use of the available spectrum resources at the expense of higher processing complexity in the transceiver. This thesis addresses the challenges and design trade offs arising during the hardware realization of Faster-than-Nyquist signaling transceivers. The FTN system has been evaluated for its achievable performance compared to the processing overhead in the transmitter and the receiver. Coexistence with OFDM systems, a more popular multicarrier scheme in existing and upcoming wireless standards, has been considered by designing FTN specific processing blocks as add-ons to the conventional transceiver chain. A multicarrier system capable of operating under both orthogonal and FTN signaling has been developed. The performance of the receiver was evaluated for AWGN and fading channels. The FTN system was able to achieve 2x improvement in bandwidth usage with similar performance as that of an OFDM system. The extra processing in the receiver was in terms of an iterative decoder for the decoding of FTN modulated signals. An efficient hardware architecture for the iterative decoder reusing the FTN specific processing blocks and realize different functionality has been designed. An ASIC implementation of this decoder was implemented in a 65nm CMOS technology and the implemented chip has been successfully verified for its functionality

Lund University Publications

Efficient Architecture of Variable Size HEVC 2D-DCT for FPGA Platforms

Author: Chen Min
Lu Chao
Zhang Yuanzhi
Publication venue: OpenSIUC
Publication date: 01/03/2017
Field of study

This study presents a design of two-dimensional (2D) discrete cosine transform (DCT) hardware architecture dedicated for High Efficiency Video Coding (HEVC) in field programmable gate array (FPGA) platforms. The proposed methodology efficiently proceeds 2D-DCT computation to fit internal components and characteristics of FPGA resources. A four-stage circuit architecture is developed to implement the proposed methodology. This architecture supports variable size of DCT computation, including 4×4, 8×8, 16×16, and 32×32. The proposed architecture has been implemented in System Verilog and synthesized in various FPGA platforms. Compared with existing related works in literature, this proposed architecture demonstrates significant advantages in hardware cost and performance improvement. The proposed architecture is able to sustain 4K@30fps ultra high definition (UHD) TV real-time encoding applications with a reduction of 31-64% in hardware cost

OpenSIUC

The Space and Earth Science Data Compression Workshop

Author: Tilton James C.
Publication venue
Publication date
Field of study

This document is the proceedings from a Space and Earth Science Data Compression Workshop, which was held on March 27, 1992, at the Snowbird Conference Center in Snowbird, Utah. This workshop was held in conjunction with the 1992 Data Compression Conference (DCC '92), which was held at the same location, March 24-26, 1992. The workshop explored opportunities for data compression to enhance the collection and analysis of space and Earth science data. The workshop consisted of eleven papers presented in four sessions. These papers describe research that is integrated into, or has the potential of being integrated into, a particular space and/or Earth science data information system. Presenters were encouraged to take into account the scientists's data requirements, and the constraints imposed by the data collection, transmission, distribution, and archival system

NASA Technical Reports Server

A Data Driven Modeling Approach for Store Distributed Load and Trajectory Prediction

Author: Peters Nicholas
Publication venue: Scholarly Commons
Publication date: 01/10/2022
Field of study

The task of achieving successful store separation from aircraft and spacecraft has historically been and continues to be, a critical issue for the aerospace industry. Whether it be from store-on-store wake interactions, store-parent body interactions or free stream turbulence, a failed case of store separation poses a serious risk to aircraft operators. Cases of failed store separation do not simply imply missing an intended target, but also bring the risk of collision with, and destruction of, the parent body vehicle. Given this risk, numerous well-tested procedures have been developed to help analyze store separation within the safe confines of wind tunnels. However, due to increased complexity in store separation configurations, such as rotorcraft and cavity-based separation, there is a growing desire to incorporate computational fluid dynamics (CFD) into the early stages of the store separation analysis. A viable method for achieving this objective is available through data-driven surrogate modeling of store distributed loads. This dissertation investigates the practicality of applying various data-driven modeling techniques to the field of store separation. These modeling methods will be applied to four demonstration scenarios: reduced order modeling of a moving store, design optimization, supersonic store separation, and rotorcraft store separation. For the first demonstration scenario, results are presented for three sub-tasks. In the first sub-task proper orthogonal decomposition (POD), dynamic mode decomposition (DMD), and convolutional neural networks (CNN) were compared for their capability to replicate distributed pressure loads of a pitching up prolate spheroid. Results indicated that POD was the most efficient approach for surrogate model generation. For the second sub-task, a POD-based surrogate model was derived from CFD simulations of an oscillating prolate spheroid subject to varying reduced frequency and amplitude of oscillation. The obtained surrogate model was shown to provide high-fidelity predictions for new combinations of reduced frequency and amplitude with a maximum percent error of integrated loads of less than 3\%. Therefore, it was demonstrated that the surrogate model was capable of predicting accurately at intermediate states. Further analysis showed a similar surrogate model could be generated to provide accurate store trajectory modeling under subsonic, transonic, and supersonic conditions. In the second demonstration scenario, a POD-based surrogate model is derived from a series of CFD simulations of isolated rotors in hover and forward flight. The derived surrogate models for hover and forward flight were shown to provide integrated load predictions within 1% of direct CFD simulation. Additionally, results indicated that computational expense could be reduced from 20 hours on 440 CPUs to less than a second on a single CPU. Given the reduction of cost and high fidelity of the surrogate model, the derived model was leveraged to optimize the twist and taper ratio of the rotor such that the efficiency of the rotor was maximized. For the third demonstration scenario, a POD and CNN surrogate model was derived for fixed-wing based supersonic store separation. Results demonstrated that both models were capable of providing high-fidelity predictions of the store\u27s distributed loads and subsequent trajectory. For the final demonstration scenario, a POD-based surrogate model was derived for the case of a store launching from a rotorcraft. The surrogate model was derived from three CFD simulations while varying ejection force. This surrogate model was then validated against CFD simulation of a new store ejection force. Results indicated that while the surrogate model struggled to provide detailed predictions of store distributed loads, mean load variations could be modeled well at a massively reduced computational cost. For each rotorcraft store separation CFD simulation, the computational cost required 10 days of simulation time across 880. While using the surrogate model, comparable predictions could be produced in under a minute on a single core. Overall findings from this study indicate that massive CFD generated data-sets can be efficiently leveraged to create meaningful surrogate models capable of being deployed to highly iterative design tasks relevant to store separation. Through further improvements, similar surrogate models can be combined with a control strategy to achieve trajectory optimization and control

Embry-Riddle Aeronautical University

Direct data-driven forecast of local turbulent heat flux in Rayleigh-Bénard convection

Author: Mäder Patrick
Pandey Sandeep
Schumacher Jörg
Teutsch Philipp
Publication venue: 'AIP Publishing'
Publication date: 26/02/2022
Field of study

A combined convolutional autoencoder-recurrent neural network machine learning model is presented to directly analyze and forecast the dynamics and low-order statistics of the local convective heat flux field in a two-dimensional turbulent Rayleigh-Bénard convection flow at Prandtl number Pr=7 and Rayleigh number Ra=10^7. Two recurrent neural networks are applied for the temporal advancement of turbulent heat transfer data in the reduced latent data space, an echo state network, and a recurrent gated unit. Thereby, our work exploits the modular combination of three different machine learning algorithms to build a fully data-driven and reduced model for the dynamics of the turbulent heat transfer in a complex thermally driven flow. The convolutional autoencoder with 12 hidden layers is able to reduce the dimensionality of the turbulence data to about 0.2% of their original size. Our results indicate a fairly good accuracy in the first- and second-order statistics of the convective heat flux. The algorithm is also able to reproduce the intermittent plume-mixing dynamics at the upper edges of the thermal boundary layers with some deviations. The same holds for the probability density function of the local convective heat flux with differences in the far tails. Furthermore, we demonstrate the noise resilience of the framework. This suggests that the present model might be applicable as a reduced dynamical model that delivers transport fluxes and their variations to coarse grids of larger-scale computational models, such as global circulation models for atmosphere and ocean

arXiv.org e-Print Archive

Digitale Bibliothek Thüringen

VLSI low-power digital signal processing

Author: Farag Emad N.
Publication venue: 'University of Waterloo'
Publication date: 01/01/1997
Field of study

University of Waterloo's Institutional Repository