36 research outputs found

    Least-Squares Approximation and Polyphase Decomposition for Pipelining Recursive filters

    Get PDF
    Current techniques used in pipelining recursive filters require high hardware complexity. These techniques attempt to preserve the exact frequency response of the original circuit while seeking to construct a pipelined architecture. We present a technique that relaxes the need to preserve the exact frequency response and instead considers a least-squares formulation in conjunction with the pipelined architecture. The benefit of this design is that it reduces the complexity of the pipelined circuit immensely, while enabling a simple pipelined architecture based on a polyphase decomposition of the original filter

    Protein Alignment Systolic Array Throughput Optimization

    Get PDF
    Protein comparison is gaining importance year after year since it has been demonstrated that biologists can find cor- relation between different species, or genetic mutations that can lead to cancer and genetic diseases. Protein sequence alignment is the most computational intensive task when performing protein comparison. In order to speed-up alignment, dedicated processors that can perform different computations in parallel have been designed. Among them, the best performance have been achieved using Systolic Arrays. However, when the Processing Elements of the Systolic Array have an internal loop, performance could be highly reduced. In this work we present an architectural strategy to address this problem applying pipeline interleaving; this strategy is applied to a Systolic Array for Smith Waterman algorithm that we designed. Results encourage the adoption of pipeline interleaving for parallel circuits with loop based Processing Elements. We demonstrate that important benefits in terms of higher operating frequency can be derived without so relevant costs as increased complexity, area and power required

    Implementation of the Trigonometric LMS Algorithm using Original Cordic Rotation

    Full text link
    The LMS algorithm is one of the most successful adaptive filtering algorithms. It uses the instantaneous value of the square of the error signal as an estimate of the mean-square error (MSE). The LMS algorithm changes (adapts) the filter tap weights so that the error signal is minimized in the mean square sense. In Trigonometric LMS (TLMS) and Hyperbolic LMS (HLMS), two new versions of LMS algorithms, same formulations are performed as in the LMS algorithm with the exception that filter tap weights are now expressed using trigonometric and hyperbolic formulations, in cases for TLMS and HLMS respectively. Hence appears the CORDIC algorithm as it can efficiently perform trigonometric, hyperbolic, linear and logarithmic functions. While hardware-efficient algorithms often exist, the dominance of the software systems has kept those algorithms out of the spotlight. Among these hardware- efficient algorithms, CORDIC is an iterative solution for trigonometric and other transcendental functions. Former researches worked on CORDIC algorithm to observe the convergence behavior of Trigonometric LMS (TLMS) algorithm and obtained a satisfactory result in the context of convergence performance of TLMS algorithm. But revious researches directly used the CORDIC block output in their simulation ignoring the internal step-by-step rotations of the CORDIC processor. This gives rise to a need for verification of the convergence performance of the TLMS algorithm to investigate if it actually performs satisfactorily if implemented with step-by-step CORDIC rotation. This research work has done this job. It focuses on the internal operations of the CORDIC hardware, implements the Trigonometric LMS (TLMS) and Hyperbolic LMS (HLMS) algorithms using actual CORDIC rotations. The obtained simulation results are highly satisfactory and also it shows that convergence behavior of HLMS is much better than TLMS.Comment: 12 pages, 5 figures, 1 table. Published in IJCNC; http://airccse.org/journal/cnc/0710ijcnc08.pdf, http://airccse.org/journal/ijc2010.htm

    Interleaving in Systolic-Arrays: a Throughput Breakthrough

    Get PDF
    In past years the most common way to improve computers performance was to increase the clock frequency. In recent years this approach suffered the limits of technology scaling, therefore computers architectures are shifting toward the direction of parallel computing to further improve circuits performance. Not only GPU based architectures are spreading in consideration, but also Systolic Arrays are particularly suited for certain classes of algorithms. An important point in favor of Systolic Arrays is that, due to the regularity of their circuit layout, they are appealing when applied to many emerging and very promising technologies, like Quantum-dot Cellular Automata and nanoarrays based on Silicon NanoWire or on Carbon nanotube Field Effect Transistors. In this work we present a systematic method to improve Systolic Arrays performance exploiting Pipelining and Input Data Interleaving. We tackle the problem from a theoretical point of view first, and then we apply it to both CMOS technology and emerging technologies. On CMOS we demonstrate that it is possible to vastly improve the overall throughput of the circuit. By applying this technique to emerging technologies we show that it is possible to overcome some of their limitations greatly improving the throughput, making a considerable step forward toward the post-CMOS era

    An efficient implementation of Forward-Backward Least-Mean-Square Adaptive Line Enhancers

    Get PDF
    An efficient implementation of the forward-backward least-mean-square (FBLMS) adaptive line enhancer is presented in this article. Without changing the characteristics of the FBLMS adaptive line enhancer, the proposed implementation technique reduces multiplications by 25% and additions by 12.5% in two successive time samples in comparison with those operations of direct implementation in both prediction and weight control. The proposed FBLMS architecture and algorithm can be applied to digital receivers for enhancing signal-to-noise ratio to allow fast carrier acquisition and tracking in both stationary and nonstationary environments

    Finite worldlength effects in fixed-point implementations of linear systems

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 173-194).by Vinay Mohta.M.Eng

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    Architecture design of video processing systems on a chip

    Get PDF

    Digital Signal Processing Techniques For Coherent Optical Communication

    Get PDF
    Coherent detection with subsequent digital signal processing (DSP) is developed, analyzed theoretically and numerically and experimentally demonstrated in various fiber-optic transmission scenarios. The use of DSP in conjunction with coherent detection unleashes the benefits of coherent detection which rely on the preservation of full information of the incoming field. These benefits include high receiver sensitivity, the ability to achieve high spectral-efficiency and the use of advanced modulation formats. With the immense advancements in DSP speeds, many of the problems hindering the use of coherent detection in optical transmission systems have been eliminated. Most notably, DSP alleviates the need for hardware phase-locking and polarization tracking, which can now be achieved in the digital domain. The complexity previously associated with coherent detection is hence significantly diminished and coherent detection is once again considered a feasible detection alternative. In this thesis, several aspects of coherent detection (with or without subsequent DSP) are addressed. Coherent detection is presented as a means to extend the dispersion limit of a duobinary signal using an analog decision-directed phase-lock loop. Analytical bit-error ratio estimation for quadrature phase-shift keying signals is derived. To validate the promise for high spectral efficiency, the orthogonal-wavelength-division multiplexing scheme is suggested. In this scheme the WDM channels are spaced at the symbol rate, thus achieving the spectral efficiency limit. Theory, simulation and experimental results demonstrate the feasibility of this approach. Infinite impulse response filtering is shown to be an efficient alternative to finite impulse response filtering for chromatic dispersion compensation. Theory, design considerations, simulation and experimental results relating to this topic are presented. Interaction between fiber dispersion and nonlinearity remains the last major challenge deterministic effects pose for long-haul optical data transmission. Experimental results which demonstrate the possibility to digitally mitigate both dispersion and nonlinearity are presented. Impairment compensation is achieved using backward propagation by implementing the split-step method. Efficient realizations of the dispersion compensation operator used in this implementation are considered. Infinite-impulse response and wavelet-based filtering are both investigated as a means to reduce the required computational load associated with signal backward-propagation. Possible future research directions conclude this dissertation