1,283 research outputs found
Quantum Dot Cellular Automata Check Node Implementation for LDPC Decoders
The quantum dot Cellular Automata (QCA) is an emerging nanotechnology that has gained significant research interest in recent years. Extremely small feature sizes, ultralow power consumption, and high clock frequency make QCA a potentially attractive solution for implementing computing architectures at the nanoscale. To be considered as a suitable CMOS substitute, the QCA technology must be able to implement complex real-time applications with affordable complexity. Low density parity check (LDPC) decoding is one of such applications. The core of LDPC decoding lies in the check node (CN) processing element which executes actual decoding algorithm and contributes toward overall performance and complexity of the LDPC decoder. This study presents a novel QCA architecture for partial parallel, layered LDPC check node. The CN executes Normalized Min Sum decoding algorithm and is flexible to support CN degree dc up to 20. The CN is constructed using a VHDL behavioral model of QCA elementary circuits which provides a hierarchical bottom up approach to evaluate the logical behavior, area, and power dissipation of the whole design. Performance evaluations are reported for the two main implementations of QCA i.e. molecular and magneti
Exploiting generalized de-Bruijn/Kautz topologies for flexible iterative channel code decoder architectures
Modern iterative channel code decoder architectures have tight constrains on the throughput but require flexibility to support different modes and standards. Unfortunately, flexibility often comes at the expense of increasing the number of clock cycles required to complete the decoding of a data-frame, thus reducing the sustained throughput. The Network- on-Chip (NoC) paradigm is an interesting option to achieve flexibility, but several design choices, including the topology and the routing algorithm, can affect the decoder throughput. In this work logarithmic diameter topologies, in particular generalized de-Bruijn and Kautz topologies, are addressed as possible solutions to achieve both flexible and high throughput architectures for iterative channel code decoding. In particular, this work shows that the optimal shortest-path routing algorithm for these topologies, that is still available in the open literature, can be efficiently implemented resorting to a very simple circuit. Experimental results show that the proposed architecture features a reduction of about 14% and 10% for area and power consumption respectively, with respect to a previous shortest-path routing-table-based desig
Result-Biased Distributed-Arithmetic-Based Filter Architectures for Approximately Computing the DWT
The discrete wavelet transform is a fundamental block in several schemes for image compression. Its implementation relies on filters that usually require multiplications leading to a relevant hardware complexity. Distributed arithmetic is a general and effective technique to implement multiplierless filters and has been exploited in the past to implement the discrete wavelet transform as well. This work proposes a general method to implement a discrete wavelet transform architecture based on distributed arithmetic to produce approximate results. The novelty of the proposed method relies on the use of result-biasing techniques (inspired by the ones used in fixed-width multiplier architectures), which cause a very small loss of quality of the compressed image (average loss of 0.11 dB and 0.20 dB in terms of PSNR for the 9/7 and 10/18 wavelet filters, respectively). Compared with previously proposed distributed-arithmetic-based architectures for the computation of the discrete wavelet transform, this technique saves from about 20% to 25% of hardware complexity
Low-Complexity Reconfigurable DCT-V Architecture
This brief presents a low-complexity, reconfigurable architecture for the Discrete Cosine Transform (DCT) of type V (DCT-V) of length 32. The proposed architecture can be reconfigured to compute five DCT-V of length 4 with negligible area overhead. As the DCT-V is one of the odd type transforms employed in the Adaptive Multiple Transform (AMT) scheme, the effect of fixed point implementation has been assessed in the Joint Exploration Model (JEM) developed by the JVET group for the Versatile-Video-Coding (VVC) forthcoming standard. Simulation results show that the proposed architecture is not only low-complexity and reconfigurable, but features also imperceptible quality loss. Moreover, when implemented in 90 nm CMOS technology it occupies only 90k eq. gates running at 187 MHz
Quantum Dot Cellular Automata Check Node Implementation for LDPC Decoders
The quantum dot Cellular Automata (QCA) is an emerging nanotechnology that has gained significant research interest in recent years. Extremely small feature sizes, ultralow power consumption, and high clock frequency make QCA a potentially attractive solution for implementing computing architectures at the nanoscale. To be considered as a suitable CMOS substitute, the QCA technology must be able to implement complex real-time applications with affordable complexity. Low density parity check (LDPC) decoding is one of such applications. The core of LDPC decoding lies in the check node (CN) processing element which executes actual decoding algorithm and contributes toward overall performance and complexity of the LDPC decoder. This study presents a novel QCA architecture for partial parallel, layered LDPC check node. The CN executes Normalized Min Sum decoding algorithm and is flexible to support CN degree dc up to 20. The CN is constructed using a VHDL behavioral model of QCA elementary circuits which provides a hierarchical bottom up approach to evaluate the logical behavior, area, and power dissipation of the whole design. Performance evaluations are reported for the two main implementations of QCA i.e. molecular and magnetic
VLSI architectures of a wiener filter for video coding
In the modern age, the use of video has become fundamental in communication and this has led to its use through an increasing number of devices. The higher resolution required for images and videos leads to more memory space and more efficient data compression, obtained by improving video coding techniques. For this reason, the Alliance for Open Media (AOMedia) developed a new open-source and royalty-free codec, named AOMedia Video 1 (AV1). This work focuses on the Wiener filter, a specific loop restoration tool of the AV1 video coding format, which features a significant amount of computational complexity. A new hardware architecture implementing the separable symmetric normalized Wiener filter is presented. Furthermore, the paper details possible optimizations starting from the basic architecture. These optimizations allow the Wiener filter to achieve a 100Ă reduction in processing time, compared to existing works, and 5Ă improvement in megasamples per second
EE-BESD: Molecular FET Modeling for Efficient and Effective Nanocomputing Design
Molecular transistor is a good candidate as
substitute of CMOS device due to small size, expected
good performance and suitability to be included in high
density-circuits.
To date a lot of effort has been carried out to under-
stand the conduction properties in molecular devices.
However, minor effort has been devoted to reduce their
computational complexity to obtain a compact molec-
ular model.
First-principle based methods frequently used are
highly computational demanding for a single device,
thus they are not suitable for complex circuit design.
In this paper we present an accurate and at the same
time computationally efficient method (named Efficient
and Effective model based on Broadening level, Evalua-
tion of peaks, Scf and Discrete levels, ee-besd) to calcu-
late the electron transport characteristics of molecular
transistors in presence of applied bias and gate voltages.
The results obtained show a remarkable improve-
ment in terms of computational time with respect to
existing approaches, while maintaining a very good ac-
curacy. Finally, the ee-besd model has been embedded
in a circuit level simulator in order to show its function-
alities and, particularly, its computational cost. This is
shown to be affordable even for circuits based on a high
number of devices
- âŠ