Search CORE

18 research outputs found

The Fifth NASA Symposium on VLSI Design

Author
Publication venue
Publication date
Field of study

The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

NASA Technical Reports Server

Implementation of a MPEG 1 layer I audio decoder with variable bit lengths

Author: OCallaghan D
Publication venue: RMIT University
Publication date: 01/01/2008
Field of study

One of the most popular forms of audio compression is MPEG (Moving Picture Experts Group). By using a VHDL (Very high-speed integrated circuit Hardware Description Language) implementation of a MPEG audio decoder and varying the word length of the constants and the multiplications used in the decoding process, and comparing the error, the minimum word length required can be determined. In general, the smaller the word length, the smaller the hardware resources required. This thesis is an investigation to find the minimum bit lengths required for each of the four multiplication sections used in a MPEG Audio decoder, that will still meet the quality levels specified in the MPEG standard. The use of the minimum bit lengths allows the minimum area resources of a FPGA (Field Programmable Gate Array) to be used. A FPGA model was designed that allowed the number of bits used to represent four constants and the results of the multiplications using these constants to vary. In order to limit the amount of data generated, testing was restricted to a single channel of audio data sampled at a frequency of 32kHz. This was then compared to the supplied C model distributed with the MPEG Audio Standard. It was found that for the MPEG audio coder to be fully compliant with the standard the bit lengths of the constants and the multiplications could be reduced by 75% and to be partial compliant with the standard, the bit lengths of the constants and the multiplications could be reduced by up to 82%. An implementation of a MPEG audio decoder in VHDL has the advantage of specific hardware, optimised, for all the different complex mathematical operations thereby reducing the repetitive operations and therefore power consumption and the time required performing these complex operations

RMIT Research Repository

Power management techniques for conserving energy in multiple system components

Author: Abou Gazala Neven M.
Publication venue
Publication date: 10/06/2008
Field of study

Energy consumption is a limiting constraint for both embedded and high performance systems. CPU-core, caches and memory contribute a large fraction of energy consumption in most computing systems. As a result, reducing the energy consumption in these components can significantly reduce the system's overall energy consumption. However, applying multiple independent power management policies in the system (one for each component) may interfere with each other and in some occasions increase the combined energy consumption.In this dissertation, I present three power management techniques that target more than a single component in a system. The focus is on reducing the total energy consumption in processors, caches and memory combinations. First, I present a memory-aware processor power management using collaboration between the OS and the compiler. The technique objectives are: (1) finish the application execution before its deadline and (2) minimize the combined energy consumption in processor and memory. Second, I present an Integrated Dynamic Voltage Scaling (IDVS) techniques for processor and on-chip cache power management in multi-voltage domains systems. IDVS co-ordinates power management decisions across voltage domains rather that being applied in isolation in each domain. Third, I present a Power-Aware Cached DRAM (PA-CDRAM) memory organization for reducing the energy consumption in DRAM memory and off-chip caches. PA-CDRAM exploits the high internal memory bandwidth by bringing the off-chip caches "closer" to the memory, which also improves the overall performance.The techniques in my thesis highlight the importance of designing power management schemes that consider multiple components and their interactions (in terms of power and performance) in the system rather than applying multiple isolated power management polices. This study should lay the foundation for further research in the domain of integrated power management, where a single power manager controls many system components

D-Scholarship@Pitt

Discrete Wavelet Transforms

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

Directory of Open Access Books (DOAB)

High Performance Multiview Video Coding

Author: Jiang Caoyang
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2017
Field of study

Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications

Michigan Technological University

Better Admission Control and Disk Scheduling for Multimedia Applications

Author: Venkatachari Badrinath
Publication venue: Digital WPI
Publication date: 01/05/2002
Field of study

General purpose operating systems have been designed to provide fast, loss-free disk service to all applications. However, multimedia applications are capable of tolerating some data loss, but are very sensitive to variation in disk service timing. Present research efforts to handle multimedia applications assume pessimistic disk behaviour when deciding to admit new multimedia connections so as not to violate the real-time application constraints. However, since multimedia applications are ``soft\u27 real-time applications that can tolerate some loss, we propose an optimistic scheme for admission control which uses average case values for disk access. Typically, disk scheduling mechanisms for multimedia applications reduce disk access times by only trying to minimize movement to subsequent blocks after sequencing based on Earliest Deadline First. We propose to implement a disk scheduling algorithm that uses knowledge of the media stored and permissible loss and jitter for each client, in addition to the physical parameters used by the other scheduling algorithms. We will evaluate our approach by implementing our admission control policy and disk scheduling algorithm in Linux and measuring the quality of various multimedia streams. If successful, the contributions of this thesis are the development of new admission control and flexible disk scheduling algorithm for improved multimedia quality of service

DigitalCommons@WPI

Low-power and high-performance SRAM design in high variability advanced CMOS technology

Author: Ataei Fatemeh
Publication venue
Publication date: 01/05/2017
Field of study

As process technologies shrink, the size and number of memories on a chip are exponentially increasing. Embedded SRAMs are a critical component in modern digital systems, and they strongly impact the overall power, performance, and area. To promote memory-related research in academia, this dissertation introduces OpenRAM, a flexible, portable and open-source memory compiler and characterization methodology for generating and verifying memory designs across different technologies.In addition, SRAM designs, focusing on improving power consumption, access time and bitcell stability are explored in high variability advanced CMOS technologies. To have a stable read/write operation for SRAM in high variability process nodes, a differential-ended single-port 8T bitcell is proposed that improves the read noise margin, write noise margin and readout bitcell current by 45%, 48% and 21%, respectively, compared to a conventional 6T bitcell. Also, a differential-ended single-port 12T bitcell for subthreshold operation is proposed that solves the half-select disturbance and allows efficient bit-interleaving. 12T bitcell has a leakage control mechanism which helps to reduce the power consumption and provides operation down to 0.3 V. Both 8T and 12T bitcells are analyzed in a 64 kb SRAM array using 32 nm technology. Besides, to further improve the access time and power consumption, two tracking circuits (multi replica bitline delay and reconfigurable replica bitline delay techniques) are proposed to aid the generation of accurate and optimum sense amplifier set time.An error tolerant SRAM architecture suitable for low voltage video application with dynamic power-quality management is also proposed in this dissertation. This memory uses three power supplies to improve the SRAM stability in low voltages. The proposed triple-supply approach achieves 63% improvement in image quality and 69% reduction in power consumption compared to a single-supply 64 kb SRAM array at 0.70 V

SHAREOK repository

A Low Power Variable Length Decoder for MPEG-2 Based on Nonuniform Fine-Grain Table Partitioning

Author: Anantha P. Ch
Anantha P. Chandrakasan
Seong Hwan Cho
Thucydides Xanthopoulos
Publication venue
Publication date: 01/01/1999
Field of study

Variable length coding is a widely used technique in digital video compression systems. Previous work related to variable length decoders (VLD's) are primarily aimed at high throughput applications, but the increased demand for portable multimedia systems has made power a very important factor. In this paper, a data-driven variable length decoding architecture is presented, which exploits the signal statistics of variable length codes to reduce power. The approach uses fine-grain lookup table (LUT) partitioning to reduce switched capacitance based on codeword frequency. The complete VLD for MPEG-2 has been fabricated and consumes 530 W at 1.35 V with a video rate of 48-M discrete cosine transform samples/s using a 0.6-m CMOS technology. More than an order of magnitude power reduction is demonstrated without performance loss compared to a conventional parallel decoding scheme with a single LUT

CiteSeerX

A low power variable length decoder for MPEG-2 based on nonuniform fine-grain table partitioning

Author: A.P. Chandrakasan
Seong Hwan Cho
T. Xanthopoulos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors

Author: Esteve García Albert
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 01/09/2017
Field of study

Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects of chip multiprocessors, such as reducing coherence overhead or the access latency to distributed caches. The effectiveness of those proposals depends to a large extent on the amount of detected private data. However, the mechanisms proposed so far either do not consider either thread migration or the private use of data within different application phases, or do entail high overhead. As a result, a considerable amount of private data is not detected. In order to increase the detection of private data, this thesis proposes a TLB-based mechanism that is able to account for both thread migration and private application phases with low overhead. Classification status in the proposed TLB-based classification mechanisms is determined by the presence of the page translation stored in other core's TLBs. The classification schemes are analyzed in multilevel TLB hierarchies, for systems with both private and distributed shared last-level TLBs. This thesis introduces a page classification approach based on inspecting other core's TLBs upon every TLB miss. In particular, the proposed classification approach is based on exchange and count of tokens. Token counting on TLBs is a natural and efficient way for classifying memory pages. It does not require the use of complex and undesirable persistent requests or arbitration, since when two ormore TLBs race for accessing a page, tokens are appropriately distributed classifying the page as shared. However, TLB-based ability to classify private pages is strongly dependent on TLB size, as it relies on the presence of a page translation in the system TLBs. To overcome that, different TLB usage predictors (UP) have been proposed, which allow a page classification unaffected by TLB size. Specifically, this thesis introduces a predictor that obtains system-wide page usage information by either employing a shared last-level TLB structure (SUP) or cooperative TLBs working together (CUP).La mayor parte de los datos referenciados por aplicaciones paralelas y secuenciales que se ejecutan enCMPs actuales son referenciadas por un único hilo, es decir, son privados. Recientemente, algunas propuestas aprovechan esta observación para mejorar muchos aspectos de los CMPs, como por ejemplo reducir el sobrecoste de la coherencia o la latencia de los accesos a cachés distribuidas. La efectividad de estas propuestas depende en gran medida de la cantidad de datos que son considerados privados. Sin embargo, los mecanismos propuestos hasta la fecha no consideran la migración de hilos de ejecución ni las fases de una aplicación. Por tanto, una cantidad considerable de datos privados no se detecta apropiadamente. Con el fin de aumentar la detección de datos privados, proponemos un mecanismo basado en las TLBs, capaz de reclasificar los datos a privado, y que detecta la migración de los hilos de ejecución sin añadir complejidad al sistema. Los mecanismos de clasificación en las TLBs se han analizado en estructuras de varios niveles, incluyendo TLBs privadas y con un último nivel de TLB compartido y distribuido. Esta tesis también presenta un mecanismo de clasificación de páginas basado en la inspección de las TLBs de otros núcleos tras cada fallo de TLB. De forma particular, el mecanismo propuesto se basa en el intercambio y el cuenteo de tokens (testigos). Contar tokens en las TLBs supone una forma natural y eficiente para la clasificación de páginas de memoria. Además, evita el uso de solicitudes persistentes o arbitraje alguno, ya que si dos o más TLBs compiten para acceder a una página, los tokens se distribuyen apropiadamente y la clasifican como compartida. Sin embargo, la habilidad de los mecanismos basados en TLB para clasificar páginas privadas depende del tamaño de las TLBs. La clasificación basada en las TLBs se basa en la presencia de una traducción en las TLBs del sistema. Para evitarlo, se han propuesto diversos predictores de uso en las TLBs (UP), los cuales permiten una clasificación independiente del tamaño de las TLBs. En concreto, esta tesis presenta un sistema mediante el que se obtiene información de uso de página a nivel de sistema con la ayuda de un nivel de TLB compartida (SUP) o mediante TLBs cooperando juntas (CUP).La major part de les dades referenciades per aplicacions paral·leles i seqüencials que s'executen en CMPs actuals són referenciades per un sol fil, és a dir, són privades. Recentment, algunes propostes aprofiten aquesta observació per a millorar molts aspectes dels CMPs, com és reduir el sobrecost de la coherència o la latència d'accés a memòries cau distribuïdes. L'efectivitat d'aquestes propostes depen en gran mesura de la quantitat de dades detectades com a privades. No obstant això, els mecanismes proposats fins a la data no consideren la migració de fils d'execució ni les fases d'una aplicació. Per tant, una quantitat considerable de dades privades no es detecta apropiadament. A fi d'augmentar la detecció de dades privades, aquesta tesi proposa un mecanisme basat en les TLBs, capaç de reclassificar les dades com a privades, i que detecta la migració dels fils d'execució sense afegir complexitat al sistema. Els mecanismes de classificació en les TLBs s'han analitzat en estructures de diversos nivells, incloent-hi sistemes amb TLBs d'últimnivell compartides i distribuïdes. Aquesta tesi presenta un mecanisme de classificació de pàgines basat en inspeccionar les TLBs d'altres nuclis després de cada fallada de TLB. Concretament, el mecanisme proposat es basa en l'intercanvi i el compte de tokens. Comptar tokens en les TLBs suposa una forma natural i eficient per a la classificació de pàgines de memòria. A més, evita l'ús de sol·licituds persistents o arbitratge, ja que si dues o més TLBs competeixen per a accedir a una pàgina, els tokens es distribueixen apropiadament i la classifiquen com a compartida. No obstant això, l'habilitat dels mecanismes basats en TLB per a classificar pàgines privades depenen de la grandària de les TLBs. La classificació basada en les TLBs resta en la presència d'una traducció en les TLBs del sistema. Per a evitar-ho, s'han proposat diversos predictors d'ús en les TLBs (UP), els quals permeten una classificació independent de la grandària de les TLBs. Específicament, aquesta tesi introdueix un predictor que obté informació d'ús de la pàgina a escala de sistema mitjançant un nivell de TLB compartida (SUP) or mitjançant TLBs cooperant juntes (CUP).Esteve García, A. (2017). Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86136TESI

Crossref

RiuNet