Search CORE

14,237 research outputs found

Document Modeling with Graph Attention Networks for Multi-grained Machine Reading Comprehension

Author: Che Wanxiang
Duan Nan
Jiang Daxin
Liang Yaobo
Liu Ting
Wen Haoyang
Zheng Bo
Zhou Ming
Publication venue
Publication date: 13/05/2020
Field of study

Natural Questions is a new challenging machine reading comprehension benchmark with two-grained answers, which are a long answer (typically a paragraph) and a short answer (one or more entities inside the long answer). Despite the effectiveness of existing methods on this benchmark, they treat these two sub-tasks individually during training while ignoring their dependencies. To address this issue, we present a novel multi-grained machine reading comprehension framework that focuses on modeling documents at their hierarchical nature, which are different levels of granularity: documents, paragraphs, sentences, and tokens. We utilize graph attention networks to obtain different levels of representations so that they can be learned simultaneously. The long and short answers can be extracted from paragraph-level representation and token-level representation, respectively. In this way, we can model the dependencies between the two-grained answers to provide evidence for each other. We jointly train the two sub-tasks, and our experiments show that our approach significantly outperforms previous systems at both long and short answer criteria.Comment: ACL202

arXiv.org e-Print Archive

IRC: Cross-layer design exploration of Intermittent Robust Computation units for IoTs

Author: DeMara Ronald F
Roohi Arman
Publication venue
Publication date: 23/04/2019
Field of study

Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of the Internet of Things (IoT) devices by utilizing ambient sources of energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile, it is proposed to innovate Intermittent Robust Computation (IRC) Unit as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in spin-based switching devices. The foundations of IRC will be advanced from the device-level upwards, by extending a Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device. The device will then be used to realize SHE-MTJ Majority/Polymorphic Gate (MG/PG) logic approaches and libraries. Then a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Finally, the NV-Clustering synthesis procedure and corresponding tool module are proposed to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. It also realizes middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Simulation results for various benchmark circuits including ISCAS-89 validate functionality and power dissipation, area, and delay benefits

arXiv.org e-Print Archive

Full-Network Embedding in a Multimodal Embedding Pipeline

Author: Ayguadé Eduard
Cortés Ulises
Garcia-Gasulla Dario
Labarta Jesus
Parés Ferran
Suzumura Toyotaro
Vilalta Armand
Publication venue
Publication date: 09/08/2017
Field of study

The current state-of-the-art for image annotation and image retrieval tasks is obtained through deep neural networks, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding in this setting, replacing the original image representation in a competitive multimodal embedding generation scheme. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale representation of images, which results in richer characterizations. To measure the influence of the Full-Network embedding, we evaluate its performance on three different datasets, and compare the results with the original multimodal embedding generation scheme when using a one-layer image embedding, and with the rest of the state-of-the-art. Results for image annotation and image retrieval tasks indicate that the Full-Network embedding is consistently superior to the one-layer embedding. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme, something feasible thanks to the flexibility of the approach.Comment: In 2nd Workshop on Semantic Deep Learning (SemDeep-2) at the 12th International Conference on Computational Semantics (IWCS) 201

arXiv.org e-Print Archive

Measuring Human Perception to Improve Handwritten Document Transcription

Author: Chiang David
Grieggs Samuel
Li Pei
Ma Jiaqi
Price Brian
Rauch Greta
Scheirer Walter J.
Shen Bingyu
Publication venue
Publication date: 17/08/2020
Field of study

The subtleties of human perception, as measured by vision scientists through the use of psychophysics, are important clues to the internal workings of visual recognition. For instance, measured reaction time can indicate whether a visual stimulus is easy for a subject to recognize, or whether it is hard. In this paper, we consider how to incorporate psychophysical measurements of visual perception into the loss function of a deep neural network being trained for a recognition task, under the assumption that such information can enforce consistency with human behavior. As a case study to assess the viability of this approach, we look at the problem of handwritten document transcription. While good progress has been made towards automatically transcribing modern handwriting, significant challenges remain in transcribing historical documents. Here we describe a general enhancement strategy, underpinned by the new loss formulation, which can be applied to the training regime of any deep learning-based document transcription system. Through experimentation, reliable performance improvement is demonstrated for the standard IAM and RIMES datasets for three different network architectures. Further, we go on to show feasibility for our approach on a new dataset of digitized Latin manuscripts, originally produced by scribes in the Cloister of St. Gall in the the 9th century

arXiv.org e-Print Archive

OneNet: Joint Domain, Intent, Slot Prediction for Spoken Language Understanding

Author: Kim Young-Bum
Lee Sungjin
Stratos Karl
Publication venue
Publication date: 16/01/2018
Field of study

In practice, most spoken language understanding systems process user input in a pipelined manner; first domain is predicted, then intent and semantic slots are inferred according to the semantic frames of the predicted domain. The pipeline approach, however, has some disadvantages: error propagation and lack of information sharing. To address these issues, we present a unified neural network that jointly performs domain, intent, and slot predictions. Our approach adopts a principled architecture for multitask learning to fold in the state-of-the-art models for each task. With a few more ingredients, e.g. orthography-sensitive input encoding and curriculum training, our model delivered significant improvements in all three tasks across all domains over strong baselines, including one using oracle prediction for domain detection, on real user data of a commercial personal assistant.Comment: 5 pages conference paper accepted to IEEE ASRU 2017. Will be published in December 201

arXiv.org e-Print Archive

jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

Author: Bowman Samuel R.
Htut Phu Mon
Liu Haokun
Phang Jason
Pruksachatkun Yada
Tenney Ian
Wang Alex
Yeres Phil
Publication venue
Publication date: 13/05/2020
Field of study

We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks. jiant enables modular and configuration-driven experimentation with state-of-the-art models and implements a broad set of tasks for probing, transfer learning, and multitask training experiments. jiant implements over 50 NLU tasks, including all GLUE and SuperGLUE benchmark tasks. We demonstrate that jiant reproduces published performance on a variety of tasks and models, including BERT and RoBERTa. jiant is available at https://jiant.info

arXiv.org e-Print Archive

PyText: A Seamless Path from NLP research to production

Author: Aly Ahmed
Arora Abhinav
Dewan Christopher
Gupta Sonal
Lakhotia Kushal
Mohit Mrinal
Nelson-Lindall Stef
Oguz Barlas
Shah Rushin
Zhao Shicong
Publication venue
Publication date: 12/12/2018
Field of study

We introduce PyText - a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch's capabilities of exporting models for inference via the optimized Caffe2 execution engine. We report our own experience of migrating experimentation and production workflows to PyText, which enabled us to iterate faster on novel modeling ideas and then seamlessly ship them at industrial scale

arXiv.org e-Print Archive

An Online Attention-based Model for Speech Recognition

Author: Chen Wei
Fan Ruchao
Jia Jia
Liu Gang
Zhou Pan
Publication venue
Publication date: 25/04/2019
Field of study

Attention-based end-to-end models such as Listen, Attend and Spell (LAS), simplify the whole pipeline of traditional automatic speech recognition (ASR) systems and become popular in the field of speech recognition. In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism. However, bidirectional encoder and GSA are two obstacles for real-time speech recognition. In this work, we aim to stream LAS baseline by removing the above two obstacles. On the encoder side, we use a latency-controlled (LC) bidirectional structure to reduce the delay of forward computation. Meanwhile, an adaptive monotonic chunk-wise attention (AMoChA) mechanism is proposed to replace GSA for the calculation of attention weight distribution. Furthermore, we propose two methods to alleviate the huge performance degradation when combining LC and AMoChA. Finally, we successfully acquire an online LAS model, LC-AMoChA, which has only 3.5% relative performance reduction to LAS baseline on our internal Mandarin corpus

arXiv.org e-Print Archive

Detecting Work Zones in SHRP 2 NDS Videos Using Deep Learning Based Computer Vision

Author: Abodo Franklin
Berthaume Andrew
Rittmuller Robert
Sumner Brian
Publication venue
Publication date: 10/11/2018
Field of study

Naturalistic driving studies seek to perform the observations of human driver behavior in the variety of environmental conditions necessary to analyze, understand and predict that behavior using statistical and physical models. The second Strategic Highway Research Program (SHRP 2) funds a number of transportation safety-related projects including its primary effort, the Naturalistic Driving Study (NDS), and an effort supplementary to the NDS, the Roadway Information Database (RID). This work seeks to expand the range of answerable research questions that researchers might pose to the NDS and RID databases. Specifically, we present the SHRP 2 NDS Video Analytics (SNVA) software application, which extracts information from NDS-instrumented vehicles' forward-facing camera footage and efficiently integrates that information into the RID, tying the video content to geolocations and other trip attributes. Of particular interest to researchers and other stakeholders is the integration of work zone, traffic signal state and weather information. The version of SNVA introduced in this paper focuses on work zone detection, the highest priority. The ability to automate the discovery and cataloging of this information, and to do so quickly, is especially important given the two petabyte (2PB) size of the NDS video data set.Comment: IEEE 17th International Conference on Machine Learning and Applications (ICMLA 2018), 3 figures, 1 table, 2 algorithm

arXiv.org e-Print Archive

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

Author: Aghajan Hamid
Hasani Hosein
Jalalifar Seyed Ali
Publication venue
Publication date: 20/03/2018
Field of study

We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.Comment: Submitted for ECCV 201

arXiv.org e-Print Archive