10,652 research outputs found

    Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

    Full text link
    Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.Comment: Accepted as a conference paper at ICLR 2017. This is an extended version of the original paper (https://arxiv.org/abs/1603.06270). The original paper proposes a new architecture, while this version focuses on transfer learning for a general model clas

    Combining Discrete and Neural Features for Sequence Labeling

    Full text link
    Neural network models have recently received heated research attention in the natural language processing community. Compared with traditional models with discrete features, neural models have two main advantages. First, they take low-dimensional, real-valued embedding vectors as inputs, which can be trained over large raw data, thereby addressing the issue of feature sparsity in discrete models. Second, deep neural networks can be used to automatically combine input features, and including non-local features that capture semantic patterns that cannot be expressed using discrete indicator features. As a result, neural network models have achieved competitive accuracies compared with the best discrete models for a range of NLP tasks. On the other hand, manual feature templates have been carefully investigated for most NLP tasks over decades and typically cover the most useful indicator pattern for solving the problems. Such information can be complementary the features automatically induced from neural networks, and therefore combining discrete and neural features can potentially lead to better accuracy compared with models that leverage discrete or neural features only. In this paper, we systematically investigate the effect of discrete and neural feature combination for a range of fundamental NLP tasks based on sequence labeling, including word segmentation, POS tagging and named entity recognition for Chinese and English, respectively. Our results on standard benchmarks show that state-of-the-art neural models can give accuracies comparable to the best discrete models in the literature for most tasks and combing discrete and neural features unanimously yield better results.Comment: Accepted by International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) 2016, Apri

    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

    Full text link
    We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale

    Text Understanding from Scratch

    Full text link
    This article demontrates that we can apply deep learning to text understanding from character-level inputs all the way up to abstract text concepts, using temporal convolutional networks (ConvNets). We apply ConvNets to various large-scale datasets, including ontology classification, sentiment analysis, and text categorization. We show that temporal ConvNets can achieve astonishing performance without the knowledge of words, phrases, sentences and any other syntactic or semantic structures with regards to a human language. Evidence shows that our models can work for both English and Chinese.Comment: This technical report is superseded by a paper entitled "Character-level Convolutional Networks for Text Classification", arXiv:1509.01626. It has considerably more experimental results and a rewritten introductio

    Learning Task-specific Representation for Novel Words in Sequence Labeling

    Full text link
    Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as the out-of-vocabulary (OOV) problem. In this work, we address the OOV problem in sequence labeling using only training data of the task. To this end, we propose a novel method to predict representations for OOV words from their surface-forms (e.g., character sequence) and contexts. The method is specifically designed to avoid the error propagation problem suffered by existing approaches in the same paradigm. To evaluate its effectiveness, we performed extensive empirical studies on four part-of-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.Comment: This work has been accepted by IJCAI 201

    Recent Progresses in Deep Learning based Acoustic Models (Updated)

    Full text link
    In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques. We first discuss acoustic models that can effectively exploit variable-length contextual information, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and their various combination with other models. We then describe acoustic models that are optimized end-to-end with emphasis on feature representations learned jointly with rest of the system, the connectionist temporal classification (CTC) criterion, and the attention-based sequence-to-sequence model. We further illustrate robustness issues in speech recognition systems, and discuss acoustic model adaptation, speech enhancement and separation, and robust training strategies. We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.Comment: This is an updated version with latest literature until ICASSP2018 of the paper: Dong Yu and Jinyu Li, "Recent Progresses in Deep Learning based Acoustic Models," vol.4, no.3, IEEE/CAA Journal of Automatica Sinica, 201

    Fast and Accurate Neural Word Segmentation for Chinese

    Full text link
    Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. This paper presents a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets.Comment: To appear in ACL201

    Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

    Full text link
    Character-based sequence labeling framework is flexible and efficient for Chinese word segmentation (CWS). Recently, many character-based neural models have been applied to CWS. While they obtain good performance, they have two obvious weaknesses. The first is that they heavily rely on manually designed bigram feature, i.e. they are not good at capturing n-gram features automatically. The second is that they make no use of full word information. For the first weakness, we propose a convolutional neural model, which is able to capture rich n-gram features without any feature engineering. For the second one, we propose an effective approach to integrate the proposed model with word embeddings. We evaluate the model on two benchmark datasets: PKU and MSR. Without any feature engineering, the model obtains competitive performance -- 95.7% on PKU and 97.3% on MSR. Armed with word embeddings, the model achieves state-of-the-art performance on both datasets -- 96.5% on PKU and 98.0% on MSR, without using any external labeled resource.Comment: will be published by IJCNLP201

    Adaptive text mining: Inferring structure from sequences

    Get PDF
    Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively

    Learning to Associate Words and Images Using a Large-scale Graph

    Full text link
    We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230-dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence.Comment: 8 pages, 7 figures, 14th Conference on Computer and Robot Vision 201
    corecore