152 research outputs found

    DATA COMPRESSION USING EFFICIENT DICTIONARY SELECTION METHOD

    Get PDF
    With the increase in silicon densities, it is becoming feasible for compression systems to be implemented in chip. A system with distributed memory architecture is based on having data compression and decompression engines working independently on different data at the same time. This data is stored in memory distributed to each processor. The objective of the project is to design a lossless data compression system which operates in high-speed to achieve high compression rate. By using the architecture of compressors, the data compression rates are significantly improved. Also inherent scalability of architecture is possible. The main parts of the system are the data compressors and the control blocks providing control signals for the Data compressors, allowing appropriate control of the routing of data into and from the system. Each Data compressor can process four bytes of data into and from a block of data in every clock cycle. The data entering the system needs to be clocked in at a rate of 4 bytes in every clock cycle. This is to ensure that adequate data is present for all compressors to process rather than being in an idle state

    Reducing Switching Activity by Test Slice Difference Technique for Test Volume Compression

    Get PDF
    [[abstract]]This paper presents a test slice difference (TSD) technique to improve test data compression. It is an efficient method and only needs one scan cell. Consequently, hardware overhead is much lower than cyclical scan chains (CSR). As the complexity of VLSI continues to grow, excessive power supply noise has become seriously. We propose a new compression scheme which smooth down the switching activity and reduce the test data volume simultaneously.[[conferencetype]]國際[[conferencelocation]]Taipei, Taiwa

    Test Slice Difference Technique for Low-Transition Test Data Compression

    Get PDF
    [[notice]]補正完畢[[incitationindex]]EI[[booktype]]電子

    Effective power saving method by on-chip traffic compression in noc-based embedded systems

    Full text link
    [EN] of components, relying on an efficient on-chip network (network-on-chip; NoC). As the size of the system increases, NoC performance and power consumption become a central issue. In this project, we design compression strategies at the NoC level reducing the number of transmitted flits and consequently the energy consumed. The provided mechanism relies on the abundance of memory data blocks filled with zeros in the analysed applications, thus easily compressible by using a zero-elimination strategy. We provide a hardware implementation for both compression and decompression end points at a generic network interface (NI). The mechanisms have been designed in isolated mode in order to make them modular and easily adapted to any NI protocol. Results show the effectiveness of the compression and decompression mechanisms and the low overhead they introduce. The percentage of traffic reduced by the compression strategy (it is reduced by a factor of 3) justifies the added resources. This work reflects some parts of the main research directions we tackle in the wider PhD framework. In particular, we propose a method for power efficient memory traffic management. The work presented here represents the initial research directions in simulation development, traffic pattern characterization and initial solutions development[ES] Con los avances de la tecnología, los sistemas en chip multiprocesador (MPSoC) aumentan en número de componentes, apoyándose en una red en el chip (NoC) eficiente. Según crece el tamaño de estos sistemas, la eficiencia de la red tanto temporal como energética se convierte en una parte primordial. En este proyecto diseñamos estrategias de compresión a nivel de red (en la NoC) reduciendo el número de flits transmitidos y por tanto la energía consumida. El método propuesto se basa en la abundancia de bloques de memoria con largas cadenas de ceros que se detectaron en las aplicaciones analizadas. Esta abundancia de ceros facilita la compresión mediante estrategias de eliminación de ceros. Ofrecemos una implementación hardware tanto de la parte de compresión como de la de descompresión sobre un interfaz de red (NI) genérico. Los mecanismos propuestos han sido diseñados de forma aislada para hacerlos modulares y fácilmente adaptables a cualquier protocolo de NI. Los resultados muestran la efectividad de los mecanismos de compresión y descompresión y la escasa penalización que introducen. El porcentaje de tráfico reducido mediante la estrategia de compresión (se reduce con un factor de 3) justifica los recursos extra requeridos. Este trabajo refleja parte de la línea de investigación global que se pretende abordar en el marco más amplio de un doctorado. En particular proponemos un método de gestión del tráfico de memoria energéticamente eficiente. El trabajo presentado aquí representa pues una primera aproximación a la investigación realizando un desarrollo parcial del simulador, caracterización de patrones de tráfico y el desarrollo de una solución parcialSoler Heredia, M. (2013). Effective power saving method by on-chip traffic compression in noc-based embedded systems. http://hdl.handle.net/10251/43774Archivo delegad

    An innovative two-stage data compression scheme using adaptive block merging technique

    Get PDF
    Test data has increased enormously owing to the rising on-chip complexity of integrated circuits. It further increases the test data transportation time and tester memory. The non-correlated test bits increase the issue of the test power. This paper presents a two-stage block merging based test data minimization scheme which reduces the test bits, test time and test power. A test data is partitioned into blocks of fixed sizes which are compressed using two-stage encoding technique. In stage one, successive blocks are merged to retain a representative block. In stage two, the retained pattern block is further encoding based on the existence of ten different subcases between the sub-block formed by splitting the retained pattern block into two halves. Non-compatible blocks are also split into two sub-blocks and tried for encoded using lesser bits. Decompression architecture to retrieve the original test data is presented. Simulation results obtained corresponding to different ISCAS′89 benchmarks circuits reflect its effectiveness in achieving better compression

    An Efficient Test Vector Compression Technique Based on Block Merging

    Get PDF
    In this paper, we present a new test data compression technique based on block merging. The technique capitalizes on the fact that many consecutive blocks of the test data can be merged together. Compression is achieved by storing the merged block and the number of blocks merged. It also takes advantage of cases where the merged block can be filled by all 0’s or all 1’s. Test data decompression is performed on chip using a simple circuitry that repeats the merged block the required number of times. The decompression circuitry has the advantage of being test data independent. Experimental results on benchmark circuits demonstrate the effectiveness of the proposed technique compared to previous approaches

    Deep learning in edge: evaluation of models and frameworks in ARM architecture

    Get PDF
    The boom and popularization of edge devices have molded its market due to stiff compe tition that provides better functionalities at low energy costs. The ARM architecture has been unanimously unopposed in the huge market segment of smartphones and still makes a presence beyond that: in drones, surveillance systems, cars, and robots. Also, it has been used successfully for the development of solutions for chains that supply food, fuel, and other services. Up until recently, ARM did not show much promise for high-level compu tation, i.e., thanks to its limited RISC instruction set, it was considered power efficient but weak in performance compared to x86 architecture. However, most recent advancements in ARM architecture pivoted that inflection point up thanks to the introduction of embed ded GPUs with DMA into LPDDR memory boards. Since this development in boards such as NVIDIA TK1, NVIDIA Jetson TX1, and NVIDIA TX2, perhaps it finally be came feasible to study and perform more challenging parallel and distributed workloads directly on a RISC-based architecture. On the other hand, the novelty of this technology poses a fundamental question of whether these boards are gaining a meaningful ratio be tween processing power and power consumption over conventional architectures or if they are bound to have reached their limitations. This work explores the Parallel Processing of Deep Learning on embedded GPUs of NVIDIA Jetson TX2 to evaluate the question above comprehensively. Thus, it uses 4 ARM boards, with 2 Deep Learning frameworks, 7 CNN models, and one medium-sized dataset combined into six board settings to con duct experiments. The experiments were conducted under similar environments, all built from the source. Altogether, the experiments ran for a total of 4,804 hours and revealed a slight advantage for MxNet on GPU-reliant training and a PyTorch overall advantage in total execution time and power, but especially for CPU-only executions. The experi ments also showed that the NVIDIA Jetson TX2 already makes feasible some complex workloads directly on its SoC

    An Efficient Test Vector Compression Technique Based on Block Merging

    Get PDF
    In this paper, we present a new test data compression technique based on block merging. The technique capitalizes on the fact that many consecutive blocks of the test data can be merged together. Compression is achieved by storing the merged block and the number of blocks merged. It also takes advantage of cases where the merged block can be filled by all 0’s or all 1’s. Test data decompression is performed on chip using a simple circuitry that repeats the merged block the required number of times. The decompression circuitry has the advantage of being test data independent. Experimental results on benchmark circuits demonstrate the effectiveness of the proposed technique compared to previous approaches

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%
    corecore