471 research outputs found

    Hybrid Machine Translation with Multi-Source Encoder-Decoder Long Short-Term Memory in English-Malay Translation

    Get PDF
    Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) are the state-of-the-art approaches in machine translation (MT). The translation produced by a SMT is based on the statistical analysis of text corpora, while NMT uses deep neural network to model and to generate a translation. SMT and NMT have their strength and weaknesses. SMT may produce better translation with a small parallel text corpus compared to NMT. Nevertheless, when the amount of parallel text available is large, the quality of the translation produced by NMT is often higher than SMT. Besides that, study also shown that the translation produced by SMT is better than NMT in cases where there is a domain mismatch between training and testing. SMT also has an advantage on long sentences. In addition, when a translation produced by an NMT is wrong, it is very difficult to find the error. In this paper, we investigate a hybrid approach that combine SMT and NMT to perform English to Malay translation. The motivation of using a hybrid machine translation is to combine the strength of both approaches to produce a more accurate translation. Our approach uses the multi-source encoder-decoder long short-term memory (LSTM) architecture. The architecture uses two encoders, one to embed the sentence to be translated, and another encoder to embed the initial translation produced by SMT. The translation from the SMT can be viewed as a “suggestion translation” to the neural MT. Our experiments show that the hybrid MT increases the BLEU scores of our best baseline machine translation in computer science domain and news domain from 21.21 and 48.35 to 35.97 and 61.81 respectively

    A Flexible BCH decoder for Flash Memory Systems using Cascaded BCH codes

    Get PDF
    NAND ash memories are widely used in consumer electronics, such as tablets, personal computers, smartphones, and gaming systems. However, unlike other standard storage devices, these ash memories suffer from various random errors. In order to address these reliability issues, various error correction codes (ECC) are employed. Bose-Chaudhuri Hocquenghem (BCH) code is the most common ECC used to address the errors in modern ash memories. Because of the limitation of the realization of the BCH codes for more extensive error correction, the modern ash memory devices use Low-density parity-check (LDPC) codes for error correction scheme. The realization of the LDPC decoders have greater complexity than BCH decoders, so these ECC decoders are implemented within the ash memory device. This thesis analyzes the limitation imposed by the state of the art implementation of BCH decoders and proposes a cascaded BCH code to address these limitations. In order to support a variety of ash memory devices, there are three main challenges to be addressed for BCH decoders. First, the latency of the BCH decoders, in the case of no error scenario, should be less than 100us. Second, there should be flexibility in supporting different ECC block size; more precisely, the solution should be able to support 256, 512, 1024, and 2048 bytes of ECC block. Third, there should be flexibility in supporting different bit errors. A recent development with Graphical Processing Units (GPUs) has attracted many researchers to use GPUs for non-graphical implementation. These GPUs are used in many consumer electronics as part of the system on chip (SOC) configuration. In this thesis we studied the limitation imposed by different implementations (VLSI, GPU, and CPU) of BCH decoders, and we propose a cascaded BCH code implemented using a hybrid approach to overcome the limitations of the BCH codes. By splitting the implementation across VLSI and GPUs, we have shown in this thesis that this method can provide flexibility over the block size and the bit error to be corrected

    Unified turbo/LDPC code decoder architecture for deep-space communications

    Get PDF
    Deep-space communications are characterized by extremely critical conditions; current standards foresee the usage of both turbo and low-density-parity-check (LDPC) codes to ensure recovery from received errors, but each of them displays consistent drawbacks. Code concatenation is widely used in all kinds of communication to boost the error correction capabilities of single codes; serial concatenation of turbo and LDPC codes has been recently proven effective enough for deep space communications, being able to overcome the shortcomings of both code types. This work extends the performance analysis of this scheme and proposes a novel hardware decoder architecture for concatenated turbo and LDPC codes based on the same decoding algorithm. This choice leads to a high degree of datapath and memory sharing; postlayout implementation results obtained with complementary metal-oxide semiconductor (CMOS) 90 nm technology show small area occupation (0.98 mm 2 ) and very low power consumption (2.1 mW)

    NengoFPGA: an FPGA Backend for the Nengo Neural Simulator

    Get PDF
    Low-power, high-speed neural networks are critical for providing deployable embedded AI applications at the edge. We describe a Xilinx FPGA implementation of Neural Engineering Framework (NEF) networks with online learning that outperforms mobile Nvidia GPU implementations by an order of magnitude or more. Specifically, we provide an embedded Python-capable PYNQ FPGA implementation supported with a Xilinx Vivado High-Level Synthesis (HLS) workflow that allows sub-millisecond implementation of adaptive neural networks with low-latency, direct I/O access to the physical world. The outcome of this work is NengoFPGA, a seamless and user-friendly extension to the neural compiler Python package Nengo. To reduce memory requirements and improve performance we tune the precision of the different intermediate variables in the code to achieve competitive absolute accuracy against slower and larger floating-point reference designs. The online learning component of the neural network exploits immediate feedback to adjust the network weights to best support a given arithmetic precision. As the space of possible design configurations of such quantized networks is vast and is subject to a target accuracy constraint, we use the Hyperopt hyper-parameter tuning tool instead of manual search to find Pareto optimal designs. Specifically, we are able to generate the optimized designs in under 500 short iterations of Vivado HLS C synthesis before running the complete Vivado place-and-route phase on that subset, a much longer process not conducive to rapid exploration. For neural network populations of 64–4096 neurons and 1–8 representational dimensions our optimized FPGA implementation generated by Hyperopt has a speedup of 10–484× over a competing cuBLAS implementation on the Jetson TX1 GPU while using 2.4–9.5× less power. Our speedups are a result of HLS-specific reformulation (15× improvement), precision adaptation (3× improvement), and low-latency direct I/O access (1000× improvement)

    Turbo decoder VLSI implementations for multi-standards wireless communication systems

    Get PDF
    corecore