13,793 research outputs found

    A fine-grained approach to scene text script identification

    Full text link
    This paper focuses on the problem of script identification in unconstrained scenarios. Script identification is an important prerequisite to recognition, and an indispensable condition for automatic text understanding systems designed for multi-language environments. Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images. We detail a novel method for script identification in natural images that combines convolutional features and the Naive-Bayes Nearest Neighbor classifier. The proposed framework efficiently exploits the discriminative power of small stroke-parts, in a fine-grained classification framework. In addition, we propose a new public benchmark dataset for the evaluation of joint text detection and script identification in natural scenes. Experiments done in this new dataset demonstrate that the proposed method yields state of the art results, while it generalizes well to different datasets and variable number of scripts. The evidence provided shows that multi-lingual scene text recognition in the wild is a viable proposition. Source code of the proposed method is made available online

    Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

    Full text link
    In this paper, we explore the encoding/pooling layer and loss function in the end-to-end speaker and language recognition system. First, a unified and interpretable end-to-end system for both speaker and language recognition is developed. It accepts variable-length input and produces an utterance level result. In the end-to-end system, the encoding layer plays a role in aggregating the variable-length input sequence into an utterance level representation. Besides the basic temporal average pooling, we introduce a self-attentive pooling layer and a learnable dictionary encoding layer to get the utterance level representation. In terms of loss function for open-set speaker verification, to get more discriminative speaker embedding, center loss and angular softmax loss is introduced in the end-to-end system. Experimental results on Voxceleb and NIST LRE 07 datasets show that the performance of end-to-end learning system could be significantly improved by the proposed encoding layer and loss function.Comment: Accepted for Speaker Odyssey 201

    Dependency relations as source context in phrase-based SMT

    Get PDF
    The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a Dutch—English translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline

    Domain Adaptation for Statistical Classifiers

    Full text link
    The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution that is related, but not identical, to the "out-of-domain" distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain
    corecore