2,158 research outputs found
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Real-Time Lossless Compression of SoC Trace Data
Nowadays, with the increasing complexity of System-on-Chip (SoC), traditional debugging approaches are not enough in multi-core architecture systems. Hardware tracing becomes necessary for performance analysis in these systems. The problem is that the size of collected trace data through hardware-based tracing techniques is usually extremely large due to the increasing complexity of System-on-Chips. Hence on-chip trace compression performed in hardware is needed to reduce the amount of transferred or stored data. In this dissertation, the feasibility of different types of lossless data compression algorithms in hardware implementation are investigated and examined. A lossless data compression algorithm LZ77 is selected, analyzed, and optimized to Nexus traces data. In order to meet the hardware cost and compression performances requirements for the real-time compression, an optimized LZ77 compression algorithm is proposed based on the characteristics of Nexus trace data. This thesis presents a hardware implementation of LZ77 encoder described in Very High Speed Integrated Circuit Hardware Description Language (VHDL). Test results demonstrate that the compression speed can achieve16 bits/clock cycle and the average compression ratio is 1.35 for the minimal hardware cost case, which is a suitable trade-off between the hardware cost and the compression performances effectively
VLSI architecture design approaches for real-time video processing
This paper discusses the programmable and dedicated approaches for real-time video processing applications. Various VLSI architecture including the design examples of both approaches are reviewed. Finally, discussions of several practical designs in real-time video processing applications are then considered in VLSI architectures to provide significant guidelines to VLSI designers for any further real-time video processing design works
- …