18,496 research outputs found

    A Standalone FPGA-based Miner for Lyra2REv2 Cryptocurrencies

    Full text link
    Lyra2REv2 is a hashing algorithm that consists of a chain of individual hashing algorithms, and it is used as a proof-of-work function in several cryptocurrencies. The most crucial and exotic hashing algorithm in the Lyra2REv2 chain is a specific instance of the general Lyra2 algorithm. This work presents the first hardware implementation of the specific instance of Lyra2 that is used in Lyra2REv2. Several properties of the aforementioned algorithm are exploited in order to optimize the design. In addition, an FPGA-based hardware implementation of a standalone miner for Lyra2REv2 on a Xilinx Multi-Processor System on Chip is presented. The proposed Lyra2REv2 miner is shown to be significantly more energy efficient than both a GPU and a commercially available FPGA-based miner. Finally, we also explain how the simplified Lyra2 and Lyra2REv2 architectures can be modified with minimal effort to also support the recent Lyra2REv3 chained hashing algorithm.Comment: 13 pages, accepted for publication in IEEE Trans. Circuits Syst. I. arXiv admin note: substantial text overlap with arXiv:1807.0576

    New primitives of controlled elements F2/4 for block ciphers

    Get PDF
    This paper develops the cipher design approach based on the use of data-dependent operations (DDOs). A new class of DDO based on the advanced controlled elements (CEs) is introduced, which is proven well suited to hardware implementations for FPGA devices. To increase the hardware implementation efficiency of block ciphers, while using contemporary FPGA devices there is proposed an approach to synthesis of fast block ciphers, which uses the substitution-permutation network constructed on the basis of the controlled elements F2/4 implementing the 2 x 2 substitutions under control of the four-bit vector. There are proposed criteria for selecting elements F2/4 and results on investigating their main cryptographic properties. It is designed a new fast 128-bit block cipher MM-128 that uses the elements F2/4 as elementary building block. The cipher possesses higher performance and requires less hardware resources for its implementation on the bases of FPGA devices than the known block ciphers. There are presented result on differential analysis of the cipher MM-12

    Comparison of high level design methodologies for algorithmic IPs : Bluespec and C-based synthesis

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (leaves 37-39).High level hardware design of Digital Signal Processing algorithms is an important design problem for decreasing design time and allowing more algorithmic exploration. Bluespec is a Hardware Design Language (HDL) that allows designers to express intended microarchitecture through high-level constructs. C-based design tools directly generate hardware from algorithms expressed in C/C++. This research compares these two design methodologies in developing hardware for Reed-Solomon decoding algorithm under area and performance metrics. This work illustrates that C-based design flow may be effective in early stages of the design development for fast prototyping. However, the Bluespec design flow produces hardware that is more customized for performance and resource constraints. This is because in later stages, designers need to have close control over the hardware structure generated that is a part of HDLs like Bluespec, but is difficult to express under the constraints of sequential C semantics.by Abhinav Agarwal.S.M

    Algorithms and Hardware Co-Design of HEVC Intra Encoders

    Get PDF
    Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction

    Low complexity hardware oriented H.264/AVC motion estimation algorithm and related low power and low cost architecture design

    Get PDF
    制度:新 ; 報告番号:甲2999号 ; 学位の種類:博士(工学) ; 授与年月日:2010/3/15 ; 早大学位記番号:新525

    An Area-Optimized Chip of Ant Colony Algorithm Design in Hardware Platform Using the Address-Based Method

    Get PDF
    The ant colony algorithm is a nature-inspired algorithm highly used for solving many complex problems and finding optimal solutions; however, the algorithm has a major flaw and that is the vast amount of calculations and if the proper correction algorithm and architectural design are not provided, it will lead to the increasing use of hardware platform due to the high volume of operations; and perhaps at higher scales, it causes the chip area not to work because of the high number of problems; hence, the purpose of this paper is to save the hardware platform as far as possible and use it optimally through providing a particular algorithm running on a reconfigurable chip driven by the address-based method, so that the comparison of synthesis operations with the similar works shows significant improvements as much as 1/3 times greater than the other similar hardware methods.DOI:http://dx.doi.org/10.11591/ijece.v4i6.692
    corecore