456 research outputs found

    Better synchronous binarization for machine translation

    Full text link
    Binarization of Synchronous Context Free Grammars (SCFG) is essential for achieving polynomial time complexity of decoding for SCFG parsing based machine translation sys-tems. In this paper, we first investigate the excess edge competition issue caused by a left-heavy binary SCFG derived with the method of Zhang et al. (2006). Then we propose a new binarization method to mitigate the problem by exploring other alternative equivalent bi-nary SCFGs. We present an algorithm that ite-ratively improves the resulting binary SCFG, and empirically show that our method can im-prove a string-to-tree statistical machine trans-lations system based on the synchronous bina-rization method in Zhang et al. (2006) on the NIST machine translation evaluation tasks.

    Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

    Get PDF
    Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

    XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

    Full text link
    Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu

    A Parallel Implementation of the Thresholding Problem by Using Tissue-Like P Systems

    Get PDF
    In this paper we present a parallel algorithm to solve the thresholding problem by using Membrane Computing techniques. This bio-inspired algorithm has been implemented in a novel device architecture called CUDATM, (Compute Unified Device Architecture). We present some examples, compare the obtained time and present some research lines for the future.Ministerio de Ciencia e InnovaciĂłn TIN2008-04487-EMinisterio de Ciencia e InnovaciĂłn TIN-2009-13192Junta de AndalucĂ­a P08-TIC-04200Ministerio de EducaciĂłn y Ciencia MTM2009-12716Junta de AndalucĂ­a PO6-TIC-02268Universidad del PaĂ­s Vasco EHU09/0

    Smart cmos image sensor for 3d measurement

    Get PDF
    3D measurements are concerned with extracting visual information from the geometry of visible surfaces and interpreting the 3D coordinate data thus obtained, to detect or track the position or reconstruct the profile of an object, often in real time. These systems necessitate image sensors with high accuracy of position estimation and high frame rate of data processing for handling large volumes of data. A standard imager cannot address the requirements of fast image acquisition and processing, which are the two figures of merit for 3D measurements. Hence, dedicated VLSI imager architectures are indispensable for designing these high performance sensors. CMOS imaging technology provides potential to integrate image processing algorithms on the focal plane of the device, resulting in smart image sensors, capable of achieving better processing features in handling massive image data. The objective of this thesis is to present a new architecture of smart CMOS image sensor for real time 3D measurement using the sheet-beam projection methods based on active triangulation. Proposing the vision sensor as an ensemble of linear sensor arrays, all working in parallel and processing the entire image in slices, the complexity of the image-processing task shifts from O (N 2 ) to O (N). Inherent also in the design is the high level of parallelism to achieve massive parallel processing at high frame rate, required in 3D computation problems. This work demonstrates a prototype of the smart linear sensor incorporating full testability features to test and debug both at device and system levels. The salient features of this work are the asynchronous position to pulse stream conversion, multiple images binarization, high parallelism and modular architecture resulting in frame rate and sub-pixel resolution suitable for real time 3D measurements

    Algebraic decoder specification: coupling formal-language theory and statistical machine translation: Algebraic decoder specification: coupling formal-language theory and statistical machine translation

    Get PDF
    The specification of a decoder, i.e., a program that translates sentences from one natural language into another, is an intricate process, driven by the application and lacking a canonical methodology. The practical nature of decoder development inhibits the transfer of knowledge between theory and application, which is unfortunate because many contemporary decoders are in fact related to formal-language theory. This thesis proposes an algebraic framework where a decoder is specified by an expression built from a fixed set of operations. As yet, this framework accommodates contemporary syntax-based decoders, it spans two levels of abstraction, and, primarily, it encourages mutual stimulation between the theory of weighted tree automata and the application

    FPGA Implementation of Blob Recognition

    Get PDF
    Real-time embedded vision systems can be used in a wide range of applications and therefore the demand has been increasing for them. In this thesis, an FPGA-based embedded vision system capable of recognizing objects in real time is presented. The proposed system architecture consists of multiple Intellectual Properties (IPs), which are used as a set of complex instructions by an integrated 32-bit CPU Microblaze. Each IP is tailored specifically to meet the needs of the application and at the same time to consume the minimum FPGA logic resources. Integrating both hardware and software on a single FPGA chip, this system can achieve the real-time performance of full VGA video processing at 32 frames per second (fps). In addition, this work comes up with a new method called Dual Connected Component Labelling (DCCL) suitable for FPGA implementation

    Syntactic and semantic features for statistical and neural machine translation

    Get PDF
    Machine Translation (MT) for language pairs with long distance dependencies and word reordering, such as German–English, is prone to producing output that is lexically or syntactically incoherent. Statistical MT (SMT) models used explicit or latent syntax to improve reordering, however failed at capturing other long distance dependencies. This thesis explores how explicit sentence-level syntactic information can improve translation for such complex linguistic phenomena. In particular, we work at the level of the syntactic-semantic interface with representations conveying the predicate-argument structures. These are essential to preserving semantics in translation and SMT systems have long struggled to model them. String-to-tree SMT systems use explicit target syntax to handle long-distance reordering, but make strong independence assumptions which lead to inconsistent lexical choices. To address this, we propose a Selectional Preferences feature which models the semantic affinities between target predicates and their argument fillers using the target dependency relations available in the decoder. We found that our feature is not effective in a string-to-tree system for German→English and that often the conditioning context is wrong because of mistranslated verbs. To improve verb translation, we proposed a Neural Verb Lexicon Model (NVLM) incorporating sentence-level syntactic context from the source which carries relevant semantic information for verb disambiguation. When used as an extra feature for re-ranking the output of a German→ English string-to-tree system, the NVLM improved verb translation precision by up to 2.7% and recall by up to 7.4%. While the NVLM improved some aspects of translation, other syntactic and lexical inconsistencies are not being addressed by a linear combination of independent models. In contrast to SMT, neural machine translation (NMT) avoids strong independence assumptions thus generating more fluent translations and capturing some long-distance dependencies. Still, incorporating additional linguistic information can improve translation quality. We proposed a method for tightly coupling target words and syntax in the NMT decoder. To represent syntax explicitly, we used CCG supertags, which encode subcategorization information, capturing long distance dependencies and attachments. Our method improved translation quality on several difficult linguistic constructs, including prepositional phrases which are the most frequent type of predicate arguments. These improvements over a strong baseline NMT system were consistent across two language pairs: 0.9 BLEU for German→English and 1.2 BLEU for Romanian→English

    Object tracking using a many-core embedded system

    Get PDF
    Object localization and tracking is essential for many practical applications, such as mancomputer interaction, security and surveillance, robot competitions, and Industry 4.0. Because of the large amount of data present in an image, and the algorithmic complexity involved, this task can be computationally demanding, mainly for traditional embedded systems, due to their processing and storage limitations. This calls for investigation and experimentation with new approaches, as emergent heterogeneous embedded systems, that promise higher performance, without compromising energy e ciency. This work explores several real-time color-based object tracking techniques, applied to images supplied by a RGB-D sensor attached to di erent embedded platforms. The main motivation was to explore an heterogeneous Parallella board with a 16-core Epiphany coprocessor, to reduce image processing time. Another goal was to confront this platform with more conventional embedded systems, namely the popular Raspberry Pi family. In this regard, several processing options were pursued, from low-level implementations specially tailored to the Parallella, to higher-level multi-platform approaches. The results achieved allow to conclude that the programming e ort required to e - ciently use the Epiphany co-processor is considerable. Also, for the selected case study, the performance attained was bellow the one o ered by simpler approaches running on quad-core Raspberry Pi boards.A localização e o seguimento de objetos sĂŁo essenciais para muitas aplicaçÔes prĂĄticas, como interação homem-computador, segurança e vigilĂąncia, competiçÔes de robĂŽs e Industria 4.0. Devido `a grande quantidade de dados presentes numa imagem, e a` complexidade algorĂ­tmica envolvida, esta tarefa pode ser computacionalmente exigente, principalmente para os sistemas embebidos tradicionais, devido Ă s suas limitaçÔes de processamento e armazenamento. Desta forma, ÂŽe importante a investigação e experimentação com novas abordagens, tais como sistemas embebidos heterogĂ©neos emergentes, que trazem consigo a promessa de melhor desempenho, sem comprometer a eficiĂȘncia energĂ©tica. Este trabalho explora vĂĄrias tÂŽtĂ©cnicas de seguimento de objetos em tempo real baseado em imagens a cores adquiridas por um sensor RBD-D, conectado a diferentes sistemas em- bebidos. A motivação principal foi a exploração de uma placa heterogĂ©nea Parallella com um co-processador Epiphany de 16 nĂșcleos, a fim de reduzir o tempo de processamento das imagens. Outro objetivo era confrontar esta plataforma com sistemas embebidos mais convencionais, nomeadamente a popular famĂ­lia Raspberry Pi. Nesse sentido, foram prosseguidas diversas opçÔes de processamento, desde implementaçÔes de baixo nĂ­vel, especĂ­ficas da placa Parallella, atĂ© abordagens multi-plataforma de mais alto nĂ­vel. Os resultados alcançados permitem concluir que o esforço de programação necessĂĄrio para utilizar eficientemente o co-processador Epiphany Ă© considerĂĄvel. Adicionalmente, para o caso de estudo deste trabalho, o desempenho alcançado fica aquĂ©m do conseguido por abordagens mais simples executando em sistemas Raspberry Pi com quatro nĂșcleos
    • 

    corecore