5,470 research outputs found

    Polyglot: Distributed Word Representations for Multilingual NLP

    Full text link
    Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational Natural Language Learning CoNLL'201

    Estimating the Market Demand for Value-Added Beef: Testing for BSE Announcement Effects Using a Nested PIGLOG Model Approach

    Get PDF
    This paper estimates an AIDS model and corrects for first-order autocorrelation using retail meat data. We fail to reject the null hypothesis of no BSE announcement effects.Demand and Price Analysis,

    The Expressive Power of Word Embeddings

    Full text link
    We seek to better understand the difference in quality of the several publicly released embeddings. We propose several tasks that help to distinguish the characteristics of different embeddings. Our evaluation of sentiment polarity and synonym/antonym relations shows that embeddings are able to capture surprisingly nuanced semantics even in the absence of sentence structure. Moreover, benchmarking the embeddings shows great variance in quality and characteristics of the semantics captured by the tested embeddings. Finally, we show the impact of varying the number of dimensions and the resolution of each dimension on the effective useful features captured by the embedding space. Our contributions highlight the importance of embeddings for NLP tasks and the effect of their quality on the final results.Comment: submitted to ICML 2013, Deep Learning for Audio, Speech and Language Processing Workshop. 8 pages, 8 figure

    Crew station research and development facility training for the light helicopter demonstration/validation program

    Get PDF
    The U.S. Army Crew Station Research and Development Branch (CSRDB) of the Aircraft Simulation Division (AVSCOM) was tasked by the Light Helicopter Program Manager (LH-PM) to provide training to Army personnel in advanced aircraft simulation technology. The purpose of this training was to prepare different groups of pilots to support and evaluate two contractor simulation efforts during the Demonstration/Validation (DEM/VAL) phase of the LH program. The personnel in the CSRDB developed mission oriented training programs to accomplish the objectives, conduct the programs, and provide guidance to army personnel and support personnel throughout the DEM/VAL phase

    A Practical Incremental Learning Framework For Sparse Entity Extraction

    Get PDF
    This work addresses challenges arising from extracting entities from textual data, including the high cost of data annotation, model accuracy, selecting appropriate evaluation criteria, and the overall quality of annotation. We present a framework that integrates Entity Set Expansion (ESE) and Active Learning (AL) to reduce the annotation cost of sparse data and provide an online evaluation method as feedback. This incremental and interactive learning framework allows for rapid annotation and subsequent extraction of sparse data while maintaining high accuracy. We evaluate our framework on three publicly available datasets and show that it drastically reduces the cost of sparse entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores respectively. Moreover, the method exhibited robust performance across all datasets.Comment: https://www.aclweb.org/anthology/C18-1059

    The perimeter of uniform and geometric words: a probabilistic analysis

    Get PDF
    Let a word be a sequence of nn i.i.d. integer random variables. The perimeter PP of the word is the number of edges of the word, seen as a polyomino. In this paper, we present a probabilistic approach to the computation of the moments of PP. This is applied to uniform and geometric random variables. We also show that, asymptotically, the distribution of PP is Gaussian and, seen as a stochastic process, the perimeter converges in distribution to a Brownian motionComment: 13 pages, 7 figure
    • …
    corecore