5,470 research outputs found
Polyglot: Distributed Word Representations for Multilingual NLP
Distributed word representations (word embeddings) have recently contributed
to competitive performance in language modeling and several NLP tasks. In this
work, we train word embeddings for more than 100 languages using their
corresponding Wikipedias. We quantitatively demonstrate the utility of our word
embeddings by using them as the sole features for training a part of speech
tagger for a subset of these languages. We find their performance to be
competitive with near state-of-art methods in English, Danish and Swedish.
Moreover, we investigate the semantic features captured by these embeddings
through the proximity of word groupings. We will release these embeddings
publicly to help researchers in the development and enhancement of multilingual
applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational
Natural Language Learning CoNLL'201
Estimating the Market Demand for Value-Added Beef: Testing for BSE Announcement Effects Using a Nested PIGLOG Model Approach
This paper estimates an AIDS model and corrects for first-order autocorrelation using retail meat data. We fail to reject the null hypothesis of no BSE announcement effects.Demand and Price Analysis,
University-Retail Industry Research Partnerships as a Means to Analyze Consumer Response: The Case of Mad Cow Disease
Consumer/Household Economics,
The Expressive Power of Word Embeddings
We seek to better understand the difference in quality of the several
publicly released embeddings. We propose several tasks that help to distinguish
the characteristics of different embeddings. Our evaluation of sentiment
polarity and synonym/antonym relations shows that embeddings are able to
capture surprisingly nuanced semantics even in the absence of sentence
structure. Moreover, benchmarking the embeddings shows great variance in
quality and characteristics of the semantics captured by the tested embeddings.
Finally, we show the impact of varying the number of dimensions and the
resolution of each dimension on the effective useful features captured by the
embedding space. Our contributions highlight the importance of embeddings for
NLP tasks and the effect of their quality on the final results.Comment: submitted to ICML 2013, Deep Learning for Audio, Speech and Language
Processing Workshop. 8 pages, 8 figure
Crew station research and development facility training for the light helicopter demonstration/validation program
The U.S. Army Crew Station Research and Development Branch (CSRDB) of the Aircraft Simulation Division (AVSCOM) was tasked by the Light Helicopter Program Manager (LH-PM) to provide training to Army personnel in advanced aircraft simulation technology. The purpose of this training was to prepare different groups of pilots to support and evaluate two contractor simulation efforts during the Demonstration/Validation (DEM/VAL) phase of the LH program. The personnel in the CSRDB developed mission oriented training programs to accomplish the objectives, conduct the programs, and provide guidance to army personnel and support personnel throughout the DEM/VAL phase
A Practical Incremental Learning Framework For Sparse Entity Extraction
This work addresses challenges arising from extracting entities from textual
data, including the high cost of data annotation, model accuracy, selecting
appropriate evaluation criteria, and the overall quality of annotation. We
present a framework that integrates Entity Set Expansion (ESE) and Active
Learning (AL) to reduce the annotation cost of sparse data and provide an
online evaluation method as feedback. This incremental and interactive learning
framework allows for rapid annotation and subsequent extraction of sparse data
while maintaining high accuracy. We evaluate our framework on three publicly
available datasets and show that it drastically reduces the cost of sparse
entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores
respectively. Moreover, the method exhibited robust performance across all
datasets.Comment: https://www.aclweb.org/anthology/C18-1059
The perimeter of uniform and geometric words: a probabilistic analysis
Let a word be a sequence of i.i.d. integer random variables. The
perimeter of the word is the number of edges of the word, seen as a
polyomino. In this paper, we present a probabilistic approach to the
computation of the moments of . This is applied to uniform and geometric
random variables. We also show that, asymptotically, the distribution of is
Gaussian and, seen as a stochastic process, the perimeter converges in
distribution to a Brownian motionComment: 13 pages, 7 figure
- …