18 research outputs found

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Implementation of a MPEG 1 layer I audio decoder with variable bit lengths

    Get PDF
    One of the most popular forms of audio compression is MPEG (Moving Picture Experts Group). By using a VHDL (Very high-speed integrated circuit Hardware Description Language) implementation of a MPEG audio decoder and varying the word length of the constants and the multiplications used in the decoding process, and comparing the error, the minimum word length required can be determined. In general, the smaller the word length, the smaller the hardware resources required. This thesis is an investigation to find the minimum bit lengths required for each of the four multiplication sections used in a MPEG Audio decoder, that will still meet the quality levels specified in the MPEG standard. The use of the minimum bit lengths allows the minimum area resources of a FPGA (Field Programmable Gate Array) to be used. A FPGA model was designed that allowed the number of bits used to represent four constants and the results of the multiplications using these constants to vary. In order to limit the amount of data generated, testing was restricted to a single channel of audio data sampled at a frequency of 32kHz. This was then compared to the supplied C model distributed with the MPEG Audio Standard. It was found that for the MPEG audio coder to be fully compliant with the standard the bit lengths of the constants and the multiplications could be reduced by 75% and to be partial compliant with the standard, the bit lengths of the constants and the multiplications could be reduced by up to 82%. An implementation of a MPEG audio decoder in VHDL has the advantage of specific hardware, optimised, for all the different complex mathematical operations thereby reducing the repetitive operations and therefore power consumption and the time required performing these complex operations

    Power management techniques for conserving energy in multiple system components

    Get PDF
    Energy consumption is a limiting constraint for both embedded and high performance systems. CPU-core, caches and memory contribute a large fraction of energy consumption in most computing systems. As a result, reducing the energy consumption in these components can significantly reduce the system's overall energy consumption. However, applying multiple independent power management policies in the system (one for each component) may interfere with each other and in some occasions increase the combined energy consumption.In this dissertation, I present three power management techniques that target more than a single component in a system. The focus is on reducing the total energy consumption in processors, caches and memory combinations. First, I present a memory-aware processor power management using collaboration between the OS and the compiler. The technique objectives are: (1) finish the application execution before its deadline and (2) minimize the combined energy consumption in processor and memory. Second, I present an Integrated Dynamic Voltage Scaling (IDVS) techniques for processor and on-chip cache power management in multi-voltage domains systems. IDVS co-ordinates power management decisions across voltage domains rather that being applied in isolation in each domain. Third, I present a Power-Aware Cached DRAM (PA-CDRAM) memory organization for reducing the energy consumption in DRAM memory and off-chip caches. PA-CDRAM exploits the high internal memory bandwidth by bringing the off-chip caches "closer" to the memory, which also improves the overall performance.The techniques in my thesis highlight the importance of designing power management schemes that consider multiple components and their interactions (in terms of power and performance) in the system rather than applying multiple isolated power management polices. This study should lay the foundation for further research in the domain of integrated power management, where a single power manager controls many system components

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

    High Performance Multiview Video Coding

    Get PDF
    Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications

    Better Admission Control and Disk Scheduling for Multimedia Applications

    Get PDF
    General purpose operating systems have been designed to provide fast, loss-free disk service to all applications. However, multimedia applications are capable of tolerating some data loss, but are very sensitive to variation in disk service timing. Present research efforts to handle multimedia applications assume pessimistic disk behaviour when deciding to admit new multimedia connections so as not to violate the real-time application constraints. However, since multimedia applications are ``soft\u27 real-time applications that can tolerate some loss, we propose an optimistic scheme for admission control which uses average case values for disk access. Typically, disk scheduling mechanisms for multimedia applications reduce disk access times by only trying to minimize movement to subsequent blocks after sequencing based on Earliest Deadline First. We propose to implement a disk scheduling algorithm that uses knowledge of the media stored and permissible loss and jitter for each client, in addition to the physical parameters used by the other scheduling algorithms. We will evaluate our approach by implementing our admission control policy and disk scheduling algorithm in Linux and measuring the quality of various multimedia streams. If successful, the contributions of this thesis are the development of new admission control and flexible disk scheduling algorithm for improved multimedia quality of service

    Low-power and high-performance SRAM design in high variability advanced CMOS technology

    Get PDF
    As process technologies shrink, the size and number of memories on a chip are exponentially increasing. Embedded SRAMs are a critical component in modern digital systems, and they strongly impact the overall power, performance, and area. To promote memory-related research in academia, this dissertation introduces OpenRAM, a flexible, portable and open-source memory compiler and characterization methodology for generating and verifying memory designs across different technologies.In addition, SRAM designs, focusing on improving power consumption, access time and bitcell stability are explored in high variability advanced CMOS technologies. To have a stable read/write operation for SRAM in high variability process nodes, a differential-ended single-port 8T bitcell is proposed that improves the read noise margin, write noise margin and readout bitcell current by 45%, 48% and 21%, respectively, compared to a conventional 6T bitcell. Also, a differential-ended single-port 12T bitcell for subthreshold operation is proposed that solves the half-select disturbance and allows efficient bit-interleaving. 12T bitcell has a leakage control mechanism which helps to reduce the power consumption and provides operation down to 0.3 V. Both 8T and 12T bitcells are analyzed in a 64 kb SRAM array using 32 nm technology. Besides, to further improve the access time and power consumption, two tracking circuits (multi replica bitline delay and reconfigurable replica bitline delay techniques) are proposed to aid the generation of accurate and optimum sense amplifier set time.An error tolerant SRAM architecture suitable for low voltage video application with dynamic power-quality management is also proposed in this dissertation. This memory uses three power supplies to improve the SRAM stability in low voltages. The proposed triple-supply approach achieves 63% improvement in image quality and 69% reduction in power consumption compared to a single-supply 64 kb SRAM array at 0.70 V

    A Low Power Variable Length Decoder for MPEG-2 Based on Nonuniform Fine-Grain Table Partitioning

    No full text
    Variable length coding is a widely used technique in digital video compression systems. Previous work related to variable length decoders (VLD's) are primarily aimed at high throughput applications, but the increased demand for portable multimedia systems has made power a very important factor. In this paper, a data-driven variable length decoding architecture is presented, which exploits the signal statistics of variable length codes to reduce power. The approach uses fine-grain lookup table (LUT) partitioning to reduce switched capacitance based on codeword frequency. The complete VLD for MPEG-2 has been fabricated and consumes 530 W at 1.35 V with a video rate of 48-M discrete cosine transform samples/s using a 0.6-m CMOS technology. More than an order of magnitude power reduction is demonstrated without performance loss compared to a conventional parallel decoding scheme with a single LUT

    A low power variable length decoder for MPEG-2 based on nonuniform fine-grain table partitioning

    No full text

    Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors

    Full text link
    Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects of chip multiprocessors, such as reducing coherence overhead or the access latency to distributed caches. The effectiveness of those proposals depends to a large extent on the amount of detected private data. However, the mechanisms proposed so far either do not consider either thread migration or the private use of data within different application phases, or do entail high overhead. As a result, a considerable amount of private data is not detected. In order to increase the detection of private data, this thesis proposes a TLB-based mechanism that is able to account for both thread migration and private application phases with low overhead. Classification status in the proposed TLB-based classification mechanisms is determined by the presence of the page translation stored in other core's TLBs. The classification schemes are analyzed in multilevel TLB hierarchies, for systems with both private and distributed shared last-level TLBs. This thesis introduces a page classification approach based on inspecting other core's TLBs upon every TLB miss. In particular, the proposed classification approach is based on exchange and count of tokens. Token counting on TLBs is a natural and efficient way for classifying memory pages. It does not require the use of complex and undesirable persistent requests or arbitration, since when two ormore TLBs race for accessing a page, tokens are appropriately distributed classifying the page as shared. However, TLB-based ability to classify private pages is strongly dependent on TLB size, as it relies on the presence of a page translation in the system TLBs. To overcome that, different TLB usage predictors (UP) have been proposed, which allow a page classification unaffected by TLB size. Specifically, this thesis introduces a predictor that obtains system-wide page usage information by either employing a shared last-level TLB structure (SUP) or cooperative TLBs working together (CUP).La mayor parte de los datos referenciados por aplicaciones paralelas y secuenciales que se ejecutan enCMPs actuales son referenciadas por un 煤nico hilo, es decir, son privados. Recientemente, algunas propuestas aprovechan esta observaci贸n para mejorar muchos aspectos de los CMPs, como por ejemplo reducir el sobrecoste de la coherencia o la latencia de los accesos a cach茅s distribuidas. La efectividad de estas propuestas depende en gran medida de la cantidad de datos que son considerados privados. Sin embargo, los mecanismos propuestos hasta la fecha no consideran la migraci贸n de hilos de ejecuci贸n ni las fases de una aplicaci贸n. Por tanto, una cantidad considerable de datos privados no se detecta apropiadamente. Con el fin de aumentar la detecci贸n de datos privados, proponemos un mecanismo basado en las TLBs, capaz de reclasificar los datos a privado, y que detecta la migraci贸n de los hilos de ejecuci贸n sin a帽adir complejidad al sistema. Los mecanismos de clasificaci贸n en las TLBs se han analizado en estructuras de varios niveles, incluyendo TLBs privadas y con un 煤ltimo nivel de TLB compartido y distribuido. Esta tesis tambi茅n presenta un mecanismo de clasificaci贸n de p谩ginas basado en la inspecci贸n de las TLBs de otros n煤cleos tras cada fallo de TLB. De forma particular, el mecanismo propuesto se basa en el intercambio y el cuenteo de tokens (testigos). Contar tokens en las TLBs supone una forma natural y eficiente para la clasificaci贸n de p谩ginas de memoria. Adem谩s, evita el uso de solicitudes persistentes o arbitraje alguno, ya que si dos o m谩s TLBs compiten para acceder a una p谩gina, los tokens se distribuyen apropiadamente y la clasifican como compartida. Sin embargo, la habilidad de los mecanismos basados en TLB para clasificar p谩ginas privadas depende del tama帽o de las TLBs. La clasificaci贸n basada en las TLBs se basa en la presencia de una traducci贸n en las TLBs del sistema. Para evitarlo, se han propuesto diversos predictores de uso en las TLBs (UP), los cuales permiten una clasificaci贸n independiente del tama帽o de las TLBs. En concreto, esta tesis presenta un sistema mediante el que se obtiene informaci贸n de uso de p谩gina a nivel de sistema con la ayuda de un nivel de TLB compartida (SUP) o mediante TLBs cooperando juntas (CUP).La major part de les dades referenciades per aplicacions paral路leles i seq眉encials que s'executen en CMPs actuals s贸n referenciades per un sol fil, 茅s a dir, s贸n privades. Recentment, algunes propostes aprofiten aquesta observaci贸 per a millorar molts aspectes dels CMPs, com 茅s reduir el sobrecost de la coher猫ncia o la lat猫ncia d'acc茅s a mem貌ries cau distribu茂des. L'efectivitat d'aquestes propostes depen en gran mesura de la quantitat de dades detectades com a privades. No obstant aix貌, els mecanismes proposats fins a la data no consideren la migraci贸 de fils d'execuci贸 ni les fases d'una aplicaci贸. Per tant, una quantitat considerable de dades privades no es detecta apropiadament. A fi d'augmentar la detecci贸 de dades privades, aquesta tesi proposa un mecanisme basat en les TLBs, capa莽 de reclassificar les dades com a privades, i que detecta la migraci贸 dels fils d'execuci贸 sense afegir complexitat al sistema. Els mecanismes de classificaci贸 en les TLBs s'han analitzat en estructures de diversos nivells, incloent-hi sistemes amb TLBs d'煤ltimnivell compartides i distribu茂des. Aquesta tesi presenta un mecanisme de classificaci贸 de p脿gines basat en inspeccionar les TLBs d'altres nuclis despr茅s de cada fallada de TLB. Concretament, el mecanisme proposat es basa en l'intercanvi i el compte de tokens. Comptar tokens en les TLBs suposa una forma natural i eficient per a la classificaci贸 de p脿gines de mem貌ria. A m茅s, evita l'煤s de sol路licituds persistents o arbitratge, ja que si dues o m茅s TLBs competeixen per a accedir a una p脿gina, els tokens es distribueixen apropiadament i la classifiquen com a compartida. No obstant aix貌, l'habilitat dels mecanismes basats en TLB per a classificar p脿gines privades depenen de la grand脿ria de les TLBs. La classificaci贸 basada en les TLBs resta en la pres猫ncia d'una traducci贸 en les TLBs del sistema. Per a evitar-ho, s'han proposat diversos predictors d'煤s en les TLBs (UP), els quals permeten una classificaci贸 independent de la grand脿ria de les TLBs. Espec铆ficament, aquesta tesi introdueix un predictor que obt茅 informaci贸 d'煤s de la p脿gina a escala de sistema mitjan莽ant un nivell de TLB compartida (SUP) or mitjan莽ant TLBs cooperant juntes (CUP).Esteve Garc铆a, A. (2017). Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors [Tesis doctoral no publicada]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/86136TESI
    corecore