4,213 research outputs found
When do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception using Individual Treatment Effect Estimation
Studies across many disciplines have shown that lexical choice can affect
audience perception. For example, how users describe themselves in a social
media profile can affect their perceived socio-economic status. However, we
lack general methods for estimating the causal effect of lexical choice on the
perception of a specific sentence. While randomized controlled trials may
provide good estimates, they do not scale to the potentially millions of
comparisons necessary to consider all lexical choices. Instead, in this paper,
we first offer two classes of methods to estimate the effect on perception of
changing one word to another in a given sentence. The first class of algorithms
builds upon quasi-experimental designs to estimate individual treatment effects
from observational data. The second class treats treatment effect estimation as
a classification problem. We conduct experiments with three data sources (Yelp,
Twitter, and Airbnb), finding that the algorithmic estimates align well with
those produced by randomized-control trials. Additionally, we find that it is
possible to transfer treatment effect classifiers across domains and still
maintain high accuracy.Comment: AAAI_201
Learning Topic-Sensitive Word Representations
Distributed word representations are widely used for modeling words in NLP
tasks. Most of the existing models generate one representation per word and do
not consider different meanings of a word. We present two approaches to learn
multiple topic-sensitive representations per word by using Hierarchical
Dirichlet Process. We observe that by modeling topics and integrating topic
distributions for each document we obtain representations that are able to
distinguish between different meanings of a given word. Our models yield
statistically significant improvements for the lexical substitution task
indicating that commonly used single word representations, even when combined
with contextual information, are insufficient for this task.Comment: 5 pages, 1 figure, Accepted at ACL 201
Recommended from our members
Simulating the Noun-Verb Asymmetry in the Productivity of Childrenās Speech
Several authors propose that children may acquire syntactic categories on the basis of co-occurrence statistics of words in the input. This paper assesses the relative merits of two such accounts by assessing the type and amount of productive language that results from computing co-occurrence statistics over conjoint and independent preceding and following contexts. This is achieved through the implementation of these methods in MOSAIC, a computational model of syntax acquisition that produces utterances that can be directly compared to child speech, and has a developmental component (i.e. produces increasingly long utterances). It is shown that the computation of co-occurrence statistics over conjoint contexts or frames results in a pattern of productive speech that more closely resembles that displayed by language learning children. The simulation of the developmental patterning of childrenās productive speech furthermore suggests two refinements to this basic mechanism: inclusion of utterance boundaries, and the weighting of frames for their lexical content
Deconstructing comprehensibility: identifying the linguistic influences on listeners' L2 comprehensibility ratings
Comprehensibility, a major concept in second language (L2) pronunciation research that denotes listenersā perceptions of how easily they understand L2 speech, is central to interlocutorsā communicative success in real-world contexts. Although comprehensibility has been modeled in several L2 oral proficiency scalesāfor example, the Test of English as a Foreign Language (TOEFL) or the International English Language Testing System (IELTS)āshortcomings of existing scales (e.g., vague descriptors) reflect limited empirical evidence as to which linguistic aspects influence listenersā judgments of L2 comprehensibility at different ability levels. To address this gap, a mixed-methods approach was used in the present study to gain a deeper understanding of the linguistic aspects underlying listenersā L2 comprehensibility ratings. First, speech samples of 40 native French learners of English were analyzed using 19 quantitative speech measures, including segmental, suprasegmental, fluency, lexical, grammatical, and discourse-level variables. These measures were then correlated with 60 native English listenersā scalar judgments of the speakersā comprehensibility. Next, three English as a second language (ESL) teachers provided introspective reports on the linguistic aspects of speech that they attended to when judging L2 comprehensibility. Following data triangulation, five speech measures were identified that clearly distinguished between L2 learners at different comprehensibility levels. Lexical richness and fluency measures differentiated between low-level learners; grammatical and discourse-level measures differentiated between high-level learners; and word stress errors discriminated between learners of all levels
Investigation into Human Preference between Common and Unambiguous Lexical Substitutions
Publisher PD
- ā¦