789 research outputs found

    On The Design Of Low-Complexity High-Speed Arithmetic Circuits In Quantum-Dot Cellular Automata Nanotechnology

    Get PDF
    For the last four decades, the implementation of very large-scale integrated systems has largely based on complementary metal-oxide semiconductor (CMOS) technology. However, this technology has reached its physical limitations. Emerging nanoscale technologies such as quantum-dot cellular automata (QCA), single electron tunneling (SET), and tunneling phase logic (TPL) are major candidate for possible replacements of CMOS. These nanotechnologies use majority and/or minority logic and inverters as circuit primitives. In this dissertation, a comprehensive methodology for majority/minority logic networks synthesis is developed. This method is capable of processing any arbitrary multi-output Boolean function to nd its equivalent optimal majority logic network targeting to optimize either the number of gates or levels. The proposed method results in different primary equivalent majority expression networks. However, the most optimized network will be generated as a nal solution. The obtained results for 15 MCNC benchmark circuits show that when the number of majority gates is the rst optimization priority, there is an average reduction of 45.3% in the number of gates and 15.1% in the number of levels. They also show that when the rst priority is the number of levels, an average reduction of 23.5% in the number of levels and 43.1% in the number of gates is possible, compared to the majority AND/OR mapping method. These results are better compared to those obtained from the best existing methods. In this dissertation, our approach is to exploit QCA technology because of its capability to implement high-density, very high-speed switching and tremendously lowpower integrated systems and is more amenable to digital circuits design. In particular, we have developed algorithms for the QCA designs of various single- and multi-operation arithmetic arrays. Even though, majority/minority logic are the basic units in promising nanotechnologies, an XOR function can be constructed in QCA as a single device. The basic cells of the proposed arrays are developed based on the fundamental logic devices in QCA and a single-layer structure of the three-input XOR function. This process leads to QCA arithmetic circuits with better results in view of dierent aspects such as cell count, area, and latency, compared to their best counterparts. The proposed arrays can be formed in a pipeline manner to perform the arithmetic operations for any number of bits which could be quite valuable while considering the future design of large-scale QCA circuits

    Emerging Design Methodology And Its Implementation Through Rns And Qca

    Get PDF
    Digital logic technology has been changing dramatically from integrated circuits, to a Very Large Scale Integrated circuits (VLSI) and to a nanotechnology logic circuits. Research focused on increasing the speed and reducing the size of the circuit design. Residue Number System (RNS) architecture has ability to support high speed concurrent arithmetic applications. To reduce the size, Quantum-Dot Cellular Automata (QCA) has become one of the new nanotechnology research field and has received a lot of attention within the engineering community due to its small size and ultralow power. In the last decade, residue number system has received increased attention due to its ability to support high speed concurrent arithmetic applications such as Fast Fourier Transform (FFT), image processing and digital filters utilizing the efficiencies of RNS arithmetic in addition and multiplication. In spite of its effectiveness, RNS has remained more an academic challenge and has very little impact in practical applications due to the complexity involved in the conversion process, magnitude comparison, overflow detection, sign detection, parity detection, scaling and division. The advancements in very large scale integration technology and demand for parallelism computation have enabled researchers to consider RNS as an alternative approach to high speed concurrent arithmetic. Novel parallel - prefix structure binary to residue number system conversion method and RNS novel scaling method are presented in this thesis. Quantum-dot cellular automata has become one of the new nanotechnology research field and has received a lot of attention within engineering community due to its extremely small feature size and ultralow power consumption compared to COMS technology. Novel methodology for generating QCA Boolean circuits from multi-output Boolean circuits is presented. Our methodology takes as its input a Boolean circuit, generates simplified XOR-AND equivalent circuit and output an equivalent majority gate circuits. During the past decade, quantum-dot cellular automata showed the ability to implement both combinational and sequential logic devices. Unlike conventional Boolean AND-OR-NOT based circuits, the fundamental logical device in QCA Boolean networks is majority gate. With combining these QCA gates with NOT gates any combinational or sequential logical device can be constructed from QCA cells. We present an implementation of generalized pipeline cellular array using quantum-dot cellular automata cells. The proposed QCA pipeline array can perform all basic operations such as multiplication, division, squaring and square rooting. The different mode of operations are controlled by a single control line

    HIGH PERFORMANCE, LOW COST SUBSPACE DECOMPOSITION AND POLYNOMIAL ROOTING FOR REAL TIME DIRECTION OF ARRIVAL ESTIMATION: ANALYSIS AND IMPLEMENTATION

    Get PDF
    This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms

    Design and FPGA Implementation of CORDIC-based 8-point 1D DCT Processor

    Get PDF
    CORDIC or CO-ordinate Rotation DIgital Computer is a fast, simple, efficient and powerful algorithm used for diverse Digital Signal Processing applications. Primarily developed for real-time airborne computations, it uses a unique computing technique which is especially suitable for solving the trigonometric relationships involved in plane co-ordinate rotation and conversion from rectangular to polar form. It comprises a special serial arithmetic unit having three shift registers, three adders/subtractors, Look-Up table and special interconnections. Using a prescribed sequence of conditional additions or subtractions the CORDIC arithmetic unit can be controlled to solve either of the following equations: Y’=K (Ycos λ+ Xsin λ) X’=K (Xcos λ - Ysin λ); where K is a constant In this project: • A CORDIC-based processor for sine/cosine calculation was designed using VHDL programming in Xilinx ISE 10.1. The CORDIC module was tested for its functionality and correctness by test-bench analysis. Subsequently, FPGA implementation of the CORDIC core followed by ChipScopePro analysis of the output logic waveforms was performed. • Using this CORDIC core a DCT processor was designed to calculate the 8-point 1D DCT. The functionality and operational correctness of this processor was tested, first on the test-bench and then via ChipScopePro analysis, post FPGA implementation. The output obtained in both the cases was compared with the actual values to test for consistency and the percentage of accuracy was established. Power consumption and FPGA resource utilization were observed. The results obtained were discussed

    QCD simulations with staggered fermions on GPUs

    Full text link
    We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it. After pointing out the main bottlenecks and how to circumvent them, we discuss the obtained performances. We present some preliminary results regarding OpenCL and multiGPU extensions of our code and discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer Physics Communication

    Array signal processing robust to pointing errors

    No full text
    The objective of this thesis is to design computationally efficient DOA (direction-of- arrival) estimation algorithms and beamformers robust to pointing errors, by harnessing the antenna geometrical information and received signals. Initially, two fast root-MUSIC-type DOA estimation algorithms are developed, which can be applied in arbitrary arrays. Instead of computing all roots, the first proposed iterative algorithm calculates the wanted roots only. The second IDFT-based method obtains the DOAs by scanning a few circles in parallel and thus the rooting is avoided. Both proposed algorithms, with less computational burden, have the asymptotically similar performance to the extended root-MUSIC. The second main contribution in this thesis is concerned with the matched direction beamformer (MDB), without using the interference subspace. The manifold vector of the desired signal is modeled as a vector lying in a known linear subspace, but the associated linear combination vector is otherwise unknown due to pointing errors. This vector can be found by computing the principal eigen-vector of a certain rank-one matrix. Then a MDB is constructed which is robust to both pointing errors and overestimation of the signal subspace dimension. Finally, an interference cancellation beamformer robust to pointing errors is considered. By means of vector space projections, much of the pointing error can be eliminated. A one-step power estimation is derived by using the theory of covariance fitting. Then an estimate-and-subtract interference canceller beamformer is proposed, in which the power inversion problem is avoided and the interferences can be cancelled completely

    Learning Deep SPD Visual Representation for Image Classification

    Get PDF
    Symmetric positive definite (SPD) visual representations are effective due to their ability to capture high-order statistics to describe images. Reliable and efficient calculation of SPD matrix representation from small sized feature maps with a high number of channels in CNN is a challenging issue. This thesis presents three novel methods to address the above challenge. The first method, called Relation Dropout (ReDro), is inspired by the fact that eigen-decomposition of a block diagonal matrix can be efficiently obtained by eigendecomposition of each block separately. Thus, instead of using a full covariance matrix as in the literature, this thesis randomly group the channels and form a covariance matrix per group. ReDro is inserted as an additional layer preceding the matrix normalisation step and the random grouping is made transparent to all subsequent layers. ReDro can be seen as a dropout-related regularisation which discards some pair-wise channel relationships across each group. The second method, called FastCOV, exploits the intrinsic connection between eigensytems of XXT and XTX. Specifically, it computes position-wise covariance matrix upon convolutional feature maps instead of the typical channel-wise covariance matrix. As the spatial size of feature maps is usually much smaller than the channel number, conducting eigen-decomposition of the position-wise covariance matrix avoids rank-deficiency and it is faster than the decomposition of the channel-wise covariance matrix. The eigenvalues and eigenvectors of the normalised channel-wise covariance matrix can be retrieved by the connection of the XXT and XTX eigen-systems. The third method, iSICE, deals with the reliable covariance estimation from small sized and highdimensional CNN feature maps. It exploits the prior structure of the covariance matrix to estimate sparse inverse covariance which is developed in the literature to deal with the covariance matrix’s small sample issue. Given a covariance matrix, this thesis iteratively minimises its log-likelihood penalised by a sparsity with gradient descend. The resultant representation characterises partial correlation instead of indirect correlation characterised in covariance representation. As experimentally demonstrated, all three proposed methods improve the image classification performance, whereas the first two proposed methods reduce the computational cost of learning large SPD visual representations

    Comparison and validation of three versions of a forest wind risk model

    Get PDF
    Predicting the probability of wind damage in both natural and managed forests is important for understanding forest ecosystem functioning, the environmental impact of storms and for forest risk management. We undertook a thorough validation of three versions of the hybrid-mechanistic wind risk model, ForestGALES, and a statistical logistic regression model, against observed damage in a Scottish upland conifer forest following a major storm. Statistical analysis demonstrated that increasing tree height and local wind speed during the storm were the main factors associated with increased damage levels. All models provided acceptable discrimination between damaged and undamaged forest stands but there were trade-offs between the accuracy of the mechanistic models and model bias. The two versions of the mechanistic model with the lowest bias gave very comparable overall results at the forest scale and could form part of a decision support system for managing forest wind damage risk
    corecore