64 research outputs found

    FPGA向けの効率的なハードウエアアルゴリズム

    Get PDF
    広島大学(Hiroshima University)博士(工学)Doctor of Engineeringdoctora

    Efficient data encoder for endoscopic imaging applications

    Get PDF
    The invention of medical imaging technology revolved the process of diagnosing diseases and opened a new world for better studying inside of the human body. In order to capture images from different human organs, different devices have been developed. Gastro-Endoscopy is an example of a medical imaging device which captures images from human gastrointestinal. With the advancement of technology, the issues regarding such devices started to get rectified. For example, with the invention of swallow-able pill photographer which is called Wireless Capsule Endoscopy (WCE); pain, time, and bleeding risk for patients are radically decreased. The development of such technologies and devices has been increased and the demands for instruments providing better performance are grown along the time. In case ofWCE, the special feature requirements such as a small size (as small as an ordinary pill) and wireless transmission of the captured images dictate restrictions in power consumption and area usage. In this research, the reduction of image encoder hardware cost for endoscopic imaging application has been focused. Several encoding algorithms have been studied and the comparative results are discussed. An efficient data encoder based on Lempel-Ziv-Welch (LZW) algorithm is presented. The encoder is a library-based one where the size of library can be modified by the user, and hence, the output data rate can be controlled according to the bandwidth requirement. The simulation is carried out with several endoscopic images and the results show that a minimum compression ratio of 92.5 % can be achieved with a minimum reconstruction quality of 30 dB. The hardware architecture and implementation result in Field-Programmable Gate Array (FPGA) for the proposed window-based LZW are also presented. A new lossy LZW algorithm is proposed and implemented in FPGA which provides promising results for such an application

    Data stream compression and decompression methods.

    Get PDF
    Cílem tento práce je prostudovat metody bezztrátové komprese a zmenšit datový tok ve komunikačním kanále, prováděním bezztrátové algoritmu komprese, který může být použity na FPGA desce s teoretickým dosahem rychlosti 1Gbit/s.Goal of this work is to study lossless compression methods and to reduce data flow in communication channel by implementing lossless compression algorithm that can be useable on FPGA board with theoretical achievement of speed 1 Gbit/s.

    GIF image hardware compressors

    Get PDF
    Increasing requirements for data transfer and storage is one of the crucial questions now. There are several ways of high-speed data transmission, but they meet limited requirements applied to their narrowly focused specific target. The data compression approach gives the solution to the problems of high-speed transfer and low-volume data storage. This paper is devoted to the compression of GIF images, using a modified LZW algorithm with a tree-based dictionary. It has led to a decrease in lookup time and an increase in the speed of data compression, and in turn, allows developing the method of constructing a hardware compression accelerator during the future research

    Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

    Get PDF
    After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers. This has triggered a wave of research towards specialized hardware accelerators. Their performance is often constrained by I/O bandwidth and the energy consumption is dominated by I/O transfers to off-chip memory. We introduce and evaluate a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks. We show that an average compression ratio of 4.4x relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic

    Performance and area evaluations of processor-based benchmarks on FPGA devices

    Get PDF
    The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

    Automated Debugging Methodology for FPGA-based Systems

    Get PDF
    Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

    Image Compression Using Tap 9/7 Wavelet Transform and Quadtree Coding Scheme

    Get PDF
    This paper is concerned with the design and implementation of an image compression method based on biorthogonal tap-9/7 discrete wavelet transform (DWT) and quadtree coding method. As a first step the color correlation is handled using YUV color representation instead of RGB. Then, the chromatic sub-bands are downsampled, and the data of each color band is transformed using wavelet transform. The produced wavelet sub-bands are quantized using hierarchal scalar quantization method. The detail quantized coefficient is coded using quadtree coding followed by Lempel-Ziv-Welch (LZW) encoding. While the approximation coefficients are coded using delta coding followed by LZW encoding. The test results indicated that the compression results are comparable to those gained by standard compression schemes

    Real-Time Lossless Compression of SoC Trace Data

    Get PDF
    Nowadays, with the increasing complexity of System-on-Chip (SoC), traditional debugging approaches are not enough in multi-core architecture systems. Hardware tracing becomes necessary for performance analysis in these systems. The problem is that the size of collected trace data through hardware-based tracing techniques is usually extremely large due to the increasing complexity of System-on-Chips. Hence on-chip trace compression performed in hardware is needed to reduce the amount of transferred or stored data. In this dissertation, the feasibility of different types of lossless data compression algorithms in hardware implementation are investigated and examined. A lossless data compression algorithm LZ77 is selected, analyzed, and optimized to Nexus traces data. In order to meet the hardware cost and compression performances requirements for the real-time compression, an optimized LZ77 compression algorithm is proposed based on the characteristics of Nexus trace data. This thesis presents a hardware implementation of LZ77 encoder described in Very High Speed Integrated Circuit Hardware Description Language (VHDL). Test results demonstrate that the compression speed can achieve16 bits/clock cycle and the average compression ratio is 1.35 for the minimal hardware cost case, which is a suitable trade-off between the hardware cost and the compression performances effectively
    corecore