1,992 research outputs found

    Signal constellation and carrier recovery technique for voice-band modems

    Get PDF

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    A Study on Efficient Designs of Approximate Arithmetic Circuits

    Get PDF
    Approximate computing is a popular field where accuracy is traded with energy. It can benefit applications such as multimedia, mobile computing and machine learning which are inherently error resilient. Error introduced in these applications to a certain degree is beyond human perception. This flexibility can be exploited to design area, delay and power efficient architectures. However, care must be taken on how approximation compromises the correctness of results. This research work aims to provide approximate hardware architectures with error metrics and design metrics analyzed and their effects in image processing applications. Firstly, we study and propose unsigned array multipliers based on probability statistics and with approximate 4-2 compressors, full adders and half adders. This work deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38% respectively compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error distance (MRED) figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous state-of-the-art works. Performance of the proposed multipliers is evaluated with geometric mean filtering application, where one of the proposed models achieves the highest peak signal to noise ratio (PSNR). Second, approximation is proposed for signed Booth multiplication. Approximation is introduced in partial product generation and partial product accumulation circuits. In this work, three multipliers (ABM-M1, ABM-M2, and ABM-M3) are proposed in which the modified Booth algorithm is approximated. In all three designs, approximate Booth partial product generators are designed with different variations of approximation. The approximations are performed by reducing the logic complexity of the Booth partial product generator, and the accumulation of partial products is slightly modified to improve circuit performance. Compared to the exact Booth multiplier, ABM-M1 achieves up to 15% reduction in power consumption with an MRED value of 7.9 × 10-4. ABM-M2 has power savings of up to 60% with an MRED of 1.1 × 10-1. ABM-M3 has power savings of up to 50% with an MRED of 3.4 × 10-3. Compared to existing approximate Booth multipliers, the proposed multipliers ABM-M1 and ABM-M3 achieve up to a 41% reduction in power consumption while exhibiting very similar error metrics. Image multiplication and matrix multiplication are used as case studies to illustrate the high performance of the proposed approximate multipliers. Third, distributed arithmetic based sum of products units approximation is analyzed. Sum of products units are key elements in many digital signal processing applications. Three approximate sum of products models which are based on distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of approximate sum of products achieves an improvement up to 64% on area and 70% on power, when compared to conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared to the first model. Third model achieves MRED and normalized mean error distance (NMED) as low as 0.05% and 0.009%. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher PSNR than existing state of the art techniques. Fourth, approximation is applied in division architecture. Two approximation models are proposed for restoring divider. In the first design, approximation is performed at circuit level, where approximate divider cells are utilized in place of exact ones by simplifying the logic equations. In the second model, restoring divider is analyzed strategically and number of restoring divider cells are reduced by finding the portions of divisor and dividend with significant information. An approximation factor pp is used in both designs. In model 1, the design with p=8 has a 58% reduction in both area and power consumption compared to exact design, with a Q-MRED of 1.909 × 10-2 and Q-NMED of 0.449 × 10-2. The second model with an approximation factor p=4 has 54% area savings and 62% power savings compared to exact design. The proposed models are found to have better error metrics compared to existing designs, with better performance at similar error values. A change detection image processing application is used for real time assessment of proposed and existing approximate dividers and one of the models achieves a PSNR of 54.27 dB

    On the design of fast and efficient wavelet image coders with reduced memory usage

    Full text link
    Image compression is of great importance in multimedia systems and applications because it drastically reduces bandwidth requirements for transmission and memory requirements for storage. Although earlier standards for image compression were based on the Discrete Cosine Transform (DCT), a recently developed mathematical technique, called Discrete Wavelet Transform (DWT), has been found to be more efficient for image coding. Despite improvements in compression efficiency, wavelet image coders significantly increase memory usage and complexity when compared with DCT-based coders. A major reason for the high memory requirements is that the usual algorithm to compute the wavelet transform requires the entire image to be in memory. Although some proposals reduce the memory usage, they present problems that hinder their implementation. In addition, some wavelet image coders, like SPIHT (which has become a benchmark for wavelet coding), always need to hold the entire image in memory. Regarding the complexity of the coders, SPIHT can be considered quite complex because it performs bit-plane coding with multiple image scans. The wavelet-based JPEG 2000 standard is still more complex because it improves coding efficiency through time-consuming methods, such as an iterative optimization algorithm based on the Lagrange multiplier method, and high-order context modeling. In this thesis, we aim to reduce memory usage and complexity in wavelet-based image coding, while preserving compression efficiency. To this end, a run-length encoder and a tree-based wavelet encoder are proposed. In addition, a new algorithm to efficiently compute the wavelet transform is presented. This algorithm achieves low memory consumption using line-by-line processing, and it employs recursion to automatically place the order in which the wavelet transform is computed, solving some synchronization problems that have not been tackled by previous proposals. The proposed encodeOliver Gil, JS. (2006). On the design of fast and efficient wavelet image coders with reduced memory usage [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1826Palanci

    Temporal Lossy In-Situ Compression for Computational Fluid Dynamics Simulations

    Get PDF
    Während CFD Simulationen für Metallschmelze im Rahmen des SFB920 fallen auf dem Taurus HPC Cluster in Dresden sehr große Datenmengen an, deren Handhabung den wissenschaftlichen Arbeitsablauf stark verlangsamen. Zum einen ist der Transfer in Visualisierungssysteme nur unter hohem Zeitaufwand möglich. Zum anderen ist interaktive Analyse von zeitlich abhängigen Prozessen auf Grund des Speicherflaschenhalses nahezu unmöglich. Aus diesen Gründen beschäftigt sich die vorliegende Dissertation mit der Entwicklung sog. Temporaler In-Situ Kompression für wissenschaftliche Daten direkt innerhalb von CFD Simulationen. Dabei werden mittels neuer Quantisierungsverfahren die Daten auf ~10% komprimiert, wobei dekomprimierte Daten einen Fehler von maximal 1% aufweisen. Im Gegensatz zu nicht-temporaler Kompression, wird bei temporaler Kompression der Unterschied zwischen Zeitschritten komprimiert, um den Kompressionsgrad zu erhöhen. Da die Datenmenge um ein Vielfaches kleiner ist, werden Kosten für die Speicherung und die Übertragung gesenkt. Da Kompression, Transfer und Dekompression bis zu 4 mal schneller ablaufen als der Transfer von unkomprimierten Daten, wird der wissenschaftliche Arbeitsablauf beschleunigt

    Compressed-domain transcoding of H.264/AVC and SVC video streams

    Get PDF

    Low bit-rate image sequence coding

    Get PDF

    Digit-slicing architectures for real-time digital filters

    Get PDF
    One of the many important algorithmic techniques in digital signal processing is real-time digital filtering. Modular sliced structures for digital filters have been proposed before, but the nature of implementation has been mainly constrained to non-recursive second order digital filters with positive values of coefficients. The aim of this research project is to extend this modular digit slicing concept to more practical higher order digital filters which are recursive and are of many forms (direct, nondirect, canonic, non-canonic). [Continues.
    corecore