Search CORE

320 research outputs found

Pipelined Two-Operand Modular Adders

Author: Czyzak M.
Horiszny J.
Smyk R.
Publication venue: 'Brno University of Technology'
Publication date: 01/04/2015
Field of study

Pipelined two-operand modular adder (TOMA) is one of basic components used in digital signal processing (DSP) systems that use the residue number system (RNS). Such modular adders are used in binary/residue and residue/binary converters, residue multipliers and scalers as well as within residue processing channels. The design of pipelined TOMAs is usually obtained by inserting an appriopriate number of latch layers inside a nonpipelined TOMA structure. Hence their area is also determined by the number of latches and the delay by the number of latch layers. In this paper we propose a new pipelined TOMA that is based on a new TOMA, that has the smaller area and smaller delay than other known structures. Comparisons are made using data from the very large scale of integration (VLSI) standard cell library

Directory of Open Access Journals

Digital library of Brno University of Technology

Maximizing resource utilization by slicing of superscalar architecture

Author: Patil Shruti Ravikant
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2006
Field of study

Superscalar architectural techniques increase instruction throughput from one instruction per cycle to more than one instruction per cycle. Modern processors make use of several processing resources to achieve this kind of throughput. Control units perform various functions to minimize stalls and to ensure a continuous feed of instructions to execution units. It is vital to ensure that instructions ready for execution do not encounter a bottleneck in the execution stage; This thesis work proposes a dynamic scheme to increase efficiency of execution stage by a methodology called block slicing. Implementing this concept in a wide, superscalar pipelined architecture introduces minimal additional hardware and delay in the pipeline. The hardware required for the implementation of the proposed scheme is designed and assessed in terms of cost and delay. Performance measures of speed-up, throughput and efficiency have been evaluated for the resulting pipeline and analyzed

University of Nevada, Las Vegas Repository

VLSI Architecture for Configurable and Low-Complexity Design of Hard-Decision Viterbi Decoding Algorithm

Author: Adiono Trio
Putra Rachmad Vidya Wicaksana
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 01/06/2016
Field of study

Convolutional encoding and data decoding are fundamental processes in convolutional error correction. One of the most popular error correction methods in decoding is the Viterbi algorithm. It is extensively implemented in many digital communication applications. Its VLSI design challenges are about area, speed, power, complexity and configurability. In this research, we specifically propose a VLSI architecture for a configurable and low-complexity design of a hard-decision Viterbi decoding algorithm. The configurable and low-complexity design is achieved by designing a generic VLSI architecture, optimizing each processing element (PE) at the logical operation level and designing a conditional adapter. The proposed design can be configured for any predefined number of trace-backs, only by changing the trace-back parameter value. Its computational process only needs N + 2 clock cycles latency, with N is the number of trace-backs. Its configurability function has been proven for N = 8, N = 16, N = 32 and N = 64. Furthermore, the proposed design was synthesized and evaluated in Xilinx and Altera FPGA target boards for area consumption and speed performance

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal

A New RTL Design Approach for a DCT/IDCT-Based Image Compression Architecture using the mCBE Algorithm

Author: Adiono Trio
Anbarsanti Nurfitri
Mareta Rella
Putra Rachmad Vidya Wicaksana
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 01/09/2012
Field of study

In the literature, several approaches of designing a DCT/IDCT-based image compression system have been proposed. In this paper, we present a new RTL design approach with as main focus developing a DCT/IDCT-based image compression architecture using a self-created algorithm. This algorithm can efficiently minimize the amount of shifter -adders to substitute multiplier s. We call this new algorithm the multiplication from Common Binary Expression (mCBE) Algorithm. Besides this algorithm, we propose alternative quantization numbers, which can be implemented simply as shifters in digital hardware. Mostly, these numbers can retain a good compressed-image quality compared to JPEG recommendations. These ideas lead to our design being small in circuit area, multiplierless, and low in complexity. The proposed 8-point 1D-DCT design has only six stages, while the 8-point 1D-IDCT design has only seven stages (one stage being defined as equal to the delay of one shifter or 2-input adder). By using the pipelining method, we can achieve a high-speed architecture with latency as a trade-off consideration. The design has been synthesized and can reach a speed of up to 1.41ns critical path delay (709.22MHz).

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal

Baseband Processing for 5G and Beyond: Algorithms, VLSI Architectures, and Co-design

Author: Mahdavi Mojtaba
Publication venue: Dpt. of Electrical and Information Technology, Lund University, Sweden
Publication date: 20/03/2021
Field of study

In recent years the number of connected devices and the demand for high data-rates have been signiﬁcantly increased. This enormous growth is more pronounced by the introduction of the Internet of things (IoT) in which several devices are interconnected to exchange data for various applications like smart homes and smart cities. Moreover, new applications such as eHealth, autonomous vehicles, and connected ambulances set new demands on the reliability, latency, and data-rate of wireless communication systems, pushing forward technology developments. Massive multiple-input multiple-output (MIMO) is a technology, which is employed in the 5G standard, offering the beneﬁts to fulﬁll these requirements. In massive MIMO systems, base station (BS) is equipped with a very large number of antennas, serving several users equipments (UEs) simultaneously in the same time and frequency resource. The high spatial multiplexing in massive MIMO systems, improves the data rate, energy and spectral efﬁciencies as well as the link reliability of wireless communication systems. The link reliability can be further improved by employing channel coding technique. Spatially coupled serially concatenated codes (SC-SCCs) are promising channel coding schemes, which can meet the high-reliability demands of wireless communication systems beyond 5G (B5G). Given the close-to-capacity error correction performance and the potential to implement a high-throughput decoder, this class of code can be a good candidate for wireless systems B5G. In order to achieve the above-mentioned advantages, sophisticated algorithms are required, which impose challenges on the baseband signal processing. In case of massive MIMO systems, the processing is much more computationally intensive and the size of required memory to store channel data is increased signiﬁcantly compared to conventional MIMO systems, which are due to the large size of the channel state information (CSI) matrix. In addition to the high computational complexity, meeting latency requirements is also crucial. Similarly, the decoding-performance gain of SC-SCCs also do come at the expense of increased implementation complexity. Moreover, selecting the proper choice of design parameters, decoding algorithm, and architecture will be challenging, since spatial coupling provides new degrees of freedom in code design, and therefore the design space becomes huge. The focus of this thesis is to perform co-optimization in different design levels to address the aforementioned challenges/requirements. To this end, we employ system-level characteristics to develop efﬁcient algorithms and architectures for the following functional blocks of digital baseband processing. First, we present a fast Fourier transform (FFT), an inverse FFT (IFFT), and corresponding reordering scheme, which can signiﬁcantly reduce the latency of orthogonal frequency-division multiplexing (OFDM) demodulation and modulation as well as the size of reordering memory. The corresponding VLSI architectures along with the application speciﬁc integrated circuit (ASIC) implementation results in a 28 nm CMOS technology are introduced. In case of a 2048-point FFT/IFFT, the proposed design leads to 42% reduction in the latency and size of reordering memory. Second, we propose a low-complexity massive MIMO detection scheme. The key idea is to exploit channel sparsity to reduce the size of CSI matrix and eventually perform linear detection followed by a non-linear post-processing in angular domain using the compressed CSI matrix. The VLSI architecture for a massive MIMO with 128 BS antennas and 16 UEs along with the synthesis results in a 28 nm technology are presented. As a result, the proposed scheme reduces the complexity and required memory by 35%–73% compared to traditional detectors while it has better detection performance. Finally, we perform a comprehensive design space exploration for the SC-SCCs to investigate the effect of different design parameters on decoding performance, latency, complexity, and hardware cost. Then, we develop different decoding algorithms for the SC-SCCs and discuss the associated decoding performance and complexity. Also, several high-level VLSI architectures along with the corresponding synthesis results in a 12 nm process are presented, and various design tradeoffs are provided for these decoding schemes

Lund University Publications

A System for Compressive Sensing Signal Reconstruction

Author: Draganic Andjela
Lekic Nedjeljko
Orovic Irena
Stankovic Srdjan
Publication venue
Publication date: 28/11/2016
Field of study

An architecture for hardware realization of a system for sparse signal reconstruction is presented. The threshold based reconstruction method is considered, which is further modified in this paper to reduce the system complexity in order to provide easier hardware realization. Instead of using the partial random Fourier transform matrix, the minimization problem is reformulated using only the triangular R matrix from the QR decomposition. The triangular R matrix can be efficiently implemented in hardware without calculating the orthogonal Q matrix. A flexible and scalable realization of matrix R is proposed, such that the size of R changes with the number of available samples and sparsity level.Comment: 6 page

arXiv.org e-Print Archive

Crossref

A Low-Complexity Decision Feedforward Equalizer Architecture for High-Speed Receivers on Highly Dispersive Channels

Author: Agazzi Oscar E.
Cousseau Juan Edmundo
Hueda Mario Rafael
Pola Ariel Luis
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

This paper presents an improved decision feedforward equalizer (DFFE) for high speed receivers in the presence of highly dispersive channels. This decision-aided equalizer technique has been recently proposed for multigigabit communication receivers, where the use of parallel processing is mandatory. Well-known parallel architectures for the typical decision feedback equalizer (DFE) have a complexity that grows exponentially with the channel memory. Instead, the new DFFE avoids that exponential increase in complexity by using tentative decisions to cancel iteratively the intersymbol interference (ISI). Here, we demostrate that the DFFE not only allows to obtain a similar performance to the typical DFE but it also reduces the compelxity in channels with large memory. Additionally, we propose a theoretical approximation for the error probability in each iteration. In fact, when the number of iteration increases, the error probability in the DFFE tends to approach the DFE. These benefits make the DFFE an excellent choice for the next generation of high-speed receivers.Fil: Pola, Ariel Luis. Universidad Nacional de Cordoba. Facultad de Cs.exactas Fisicas y Naturales. Departamento de Electronica. Laboratorio de Comunicaciones; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Cousseau, Juan Edmundo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Bahía Blanca. Instituto de Investigación En Ingeniería Eléctrica; Argentina. Universidad Nacional del Sur; ArgentinaFil: Agazzi, Oscar E.. Irvine Center Drive. ClariPhy Communications; Estados UnidosFil: Hueda, Mario Rafael. Universidad Nacional de Cordoba. Facultad de Cs.exactas Fisicas y Naturales. Departamento de Electronica. Laboratorio de Comunicaciones; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin

CONICET Digital

Directory of Open Access Journals

Achieving Energy Efficiency in Analogue and Mixed Signal Integrated Circuit Design

Author: C.I. Luján-Martínez
E. López-Morillo
F. Munoz
F. Márquez
T. Sánchez-Rodríguez
Publication venue: 'IntechOpen'
Publication date: 04/04/2012
Field of study

IntechOpen

On the Distribution of Control in Asynchronous Processor Architectures

Author: Rebello Vinod
Publication venue: University of Edinburgh. College of Science and Engineering. School of Informatics.
Publication date: 01/07/1997
Field of study

Institute for Computing Systems ArchitectureThe effective performance of computer systems is to a large measure determined by the synergy between the processor architecture, the instruction set and the compiler. In the past, the sequencing of information within processor architectures has normally been synchronous: controlled centrally by a clock. However, this global signal could possibly limit the future gains in performance that can potentially be achieved through improvements in implementation technology. This thesis investigates the effects of relaxing this strict synchrony by distributing control within processor architectures through the use of a novel asynchronous design model known as a micronet. The impact of asynchronous control on the performance of a RISC-style processor is explored at different levels. Firstly, improvements in the performance of individual instructions by exploiting actual run-time behaviours are demonstrated. Secondly, it is shown that micronets are able to exploit further (both spatial and temporal) instructionlevel parallelism (ILP) efficiently through the distribution of control to datapath resources. Finally, exposing fine-grain concurrency within a datapath can only be of benefit to a computer system if it can easily be exploited by the compiler. Although compilers for micronet-based asynchronous processors may be considered to be more complex than their synchronous counterparts, it is shown that the variable execution time of an instruction does not adversely affect the compiler's ability to schedule code efficiently. In conclusion, the modelling of a processor's datapath as a micronet permits the exploitation of both finegrain ILP and actual run-time delays, thus leading to the efficient utilisation of functional units and in turn resulting in an improvement in overall system performance

Edinburgh Research Archive