53,429 research outputs found

    Neural Speed Reading with Structural-Jump-LSTM

    Full text link
    Recurrent neural networks (RNNs) can model natural language by sequentially 'reading' input tokens and outputting a distributed representation of each token. Due to the sequential nature of RNNs, inference time is linearly dependent on the input length, and all inputs are read regardless of their importance. Efforts to speed up this inference, known as 'neural speed reading', either ignore or skim over part of the input. We present Structural-Jump-LSTM: the first neural speed reading model to both skip and jump text during inference. The model consists of a standard LSTM and two agents: one capable of skipping single words when reading, and one capable of exploiting punctuation structure (sub-sentence separators (,:), sentence end symbols (.!?), or end of text markers) to jump ahead after reading a word. A comprehensive experimental evaluation of our model against all five state-of-the-art neural reading models shows that Structural-Jump-LSTM achieves the best overall floating point operations (FLOP) reduction (hence is faster), while keeping the same accuracy or even improving it compared to a vanilla LSTM that reads the whole text.Comment: 10 page

    Semantic Similarity Loss for Neural Source Code Summarization

    Full text link
    This paper presents an improved loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. We propose and evaluate a loss function to alleviate this problem. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with traditional CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report an improvement in the vast majority of conditions.Comment: 20 pages + 8 figures + 5 references. Preprint In Review Aug. 202

    Self-Adaptive Hierarchical Sentence Model

    Full text link
    The ability to accurately model a sentence at varying stages (e.g., word-phrase-sentence) plays a central role in natural language processing. As an effort towards this goal we propose a self-adaptive hierarchical sentence model (AdaSent). AdaSent effectively forms a hierarchy of representations from words to phrases and then to sentences through recursive gated local composition of adjacent segments. We design a competitive mechanism (through gating networks) to allow the representations of the same sentence to be engaged in a particular learning task (e.g., classification), therefore effectively mitigating the gradient vanishing problem persistent in other recursive models. Both qualitative and quantitative analysis shows that AdaSent can automatically form and select the representations suitable for the task at hand during training, yielding superior classification performance over competitor models on 5 benchmark data sets.Comment: 8 pages, 7 figures, accepted as a full paper at IJCAI 201

    Person Search with Natural Language Description

    Full text link
    Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textual description of a person, the algorithm of the person search is required to rank all the samples in the person database then retrieve the most relevant sample corresponding to the queried description. Since there is no person dataset or benchmark with textual description available, we collect a large-scale person description dataset with detailed natural language annotations and person samples from various sources, termed as CUHK Person Description Dataset (CUHK-PEDES). A wide range of possible models and baselines have been evaluated and compared on the person search benchmark. An Recurrent Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to establish the state-of-the art performance on person search
    • …
    corecore