Search CORE

5,470 research outputs found

Polyglot: Distributed Word Representations for Multilingual NLP

Author: Al-Rfou Rami
Perozzi Bryan
Skiena Steven
Publication venue
Publication date: 27/06/2014
Field of study

Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational Natural Language Learning CoNLL'201

arXiv.org e-Print Archive

CiteSeerX

Estimating the Market Demand for Value-Added Beef: Testing for BSE Announcement Effects Using a Nested PIGLOG Model Approach

Author: Bailey DeeVon
Dustin Al
Vickner Steven S.
Publication venue
Publication date
Field of study

This paper estimates an AIDS model and corrects for first-order autocorrelation using retail meat data. We fail to reject the null hypothesis of no BSE announcement effects.Demand and Price Analysis,

Research Papers in Economics

University-Retail Industry Research Partnerships as a Means to Analyze Consumer Response: The Case of Mad Cow Disease

Author: Bailey DeeVon
Dustin Al
Vickner Steven S.
Publication venue
Publication date
Field of study

Consumer/Household Economics,

Research Papers in Economics

The Expressive Power of Word Embeddings

Author: Al-Rfou Rami
Chen Yanqing
Perozzi Bryan
Skiena Steven
Publication venue
Publication date: 29/05/2013
Field of study

We seek to better understand the difference in quality of the several publicly released embeddings. We propose several tasks that help to distinguish the characteristics of different embeddings. Our evaluation of sentiment polarity and synonym/antonym relations shows that embeddings are able to capture surprisingly nuanced semantics even in the absence of sentence structure. Moreover, benchmarking the embeddings shows great variance in quality and characteristics of the semantics captured by the tested embeddings. Finally, we show the impact of varying the number of dimensions and the resolution of each dimension on the effective useful features captured by the embedding space. Our contributions highlight the importance of embeddings for NLP tasks and the effect of their quality on the final results.Comment: submitted to ICML 2013, Deep Learning for Audio, Speech and Language Processing Workshop. 8 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crew station research and development facility training for the light helicopter demonstration/validation program

Author: Matsumoto Joy Hamerman
Mccauley Michael
Rogers Steven
Salinas AL
Publication venue
Publication date
Field of study

The U.S. Army Crew Station Research and Development Branch (CSRDB) of the Aircraft Simulation Division (AVSCOM) was tasked by the Light Helicopter Program Manager (LH-PM) to provide training to Army personnel in advanced aircraft simulation technology. The purpose of this training was to prepare different groups of pilots to support and evaluate two contractor simulation efforts during the Demonstration/Validation (DEM/VAL) phase of the LH program. The personnel in the CSRDB developed mission oriented training programs to accomplish the objectives, conduct the programs, and provide guidance to army personnel and support personnel throughout the DEM/VAL phase

NASA Technical Reports Server

A Practical Incremental Learning Framework For Sparse Entity Extraction

Author: Al-Olimat Hussein S.
Gustafson Steven
Mackay Jason
Sheth Amit
Thirunarayan Krishnaprasad
Publication venue
Publication date: 01/08/2018
Field of study

This work addresses challenges arising from extracting entities from textual data, including the high cost of data annotation, model accuracy, selecting appropriate evaluation criteria, and the overall quality of annotation. We present a framework that integrates Entity Set Expansion (ESE) and Active Learning (AL) to reduce the annotation cost of sparse data and provide an online evaluation method as feedback. This incremental and interactive learning framework allows for rapid annotation and subsequent extraction of sparse data while maintaining high accuracy. We evaluate our framework on three publicly available datasets and show that it drastically reduces the cost of sparse entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores respectively. Moreover, the method exhibited robust performance across all datasets.Comment: https://www.aclweb.org/anthology/C18-1059

arXiv.org e-Print Archive

Scholar Commons - Institutional Repository of the University of South Carolina

CORE

The perimeter of uniform and geometric words: a probabilistic analysis

Author: Clinton Brawner
Jonathan Ehrman
Manal Alghamdi
Mouaz Al-Mallah
Sherif Sakr
Steven Keteyian
Publication venue
Publication date: 01/01/2017
Field of study

Let a word be a sequence of

n

i.i.d. integer random variables. The perimeter

P

of the word is the number of edges of the word, seen as a polyomino. In this paper, we present a probabilistic approach to the computation of the moments of

P

. This is applied to uniform and geometric random variables. We also show that, asymptotically, the distribution of

P

is Gaussian and, seen as a stochastic process, the perimeter converges in distribution to a Brownian motionComment: 13 pages, 7 figure

arXiv.org e-Print Archive

Henry Ford Health System Scholarly Commons

Directory of Open Access Journals

UNSWorks

FigShare