Search CORE

160,336 research outputs found

When do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception using Individual Treatment Effect Estimation

Author: Culotta Aron
Wang Zhao
Publication venue
Publication date: 14/11/2018
Field of study

Studies across many disciplines have shown that lexical choice can affect audience perception. For example, how users describe themselves in a social media profile can affect their perceived socio-economic status. However, we lack general methods for estimating the causal effect of lexical choice on the perception of a specific sentence. While randomized controlled trials may provide good estimates, they do not scale to the potentially millions of comparisons necessary to consider all lexical choices. Instead, in this paper, we first offer two classes of methods to estimate the effect on perception of changing one word to another in a given sentence. The first class of algorithms builds upon quasi-experimental designs to estimate individual treatment effects from observational data. The second class treats treatment effect estimation as a classification problem. We conduct experiments with three data sources (Yelp, Twitter, and Airbnb), finding that the algorithmic estimates align well with those produced by randomized-control trials. Additionally, we find that it is possible to transfer treatment effect classifiers across domains and still maintain high accuracy.Comment: AAAI_201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Improving Lexical Choice in Neural Machine Translation

Author: Chiang David
Nguyen Toan Q.
Publication venue
Publication date: 01/01/2018
Field of study

We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.Comment: Accepted at NAACL HLT 201

arXiv.org e-Print Archive

Crossref

Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities

Author: Radev Dragomir R.
Publication venue
Publication date: 01/01/1998
Field of study

This paper presents the results of a study on the semantic constraints imposed on lexical choice by certain contextual indicators. We show how such indicators are computed and how correlations between them and the choice of a noun phrase description of a named entity can be automatically established using supervised learning. Based on this correlation, we have developed a technique for automatic lexical choice of descriptions of entities in text generation. We discuss the underlying relationship between the pragmatics of choosing an appropriate description that serves a specific purpose in the automatically generated text and the semantics of the description itself. We present our work in the framework of the more general concept of reuse of linguistic structures that are automatically extracted from large corpora. We present a formal evaluation of our approach and we conclude with some thoughts on potential applications of our method.Comment: 7 pages, uses colacl.sty and acl.bst, uses epsfig. To appear in the Proceedings of the Joint 17th International Conference on Computational Linguistics 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL'98

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

Recommended from our members

Floating constraints in lexical choice

Author: Elhadad Michael
McKeown Kathleen
Robin Jacques
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1997
Field of study

Lexical choice is a computationally complex task, requiring a generation system to consider a potentially large number of mappings between concepts and words. Constraints that aid in determining which word is best come from a wide variety of sources, including syntax, semantics, pragmatics, the lexicon, and the underlying domain. Furthermore, in some situations, different constraints come into play early on, while in others, they apply much later. This makes it difficult to determine a systematic ordering in which to apply constraints. In this paper, we present a general approach to lexical choice that can handle multiple, interacting constraints. We focus on the problem of floating constraints, semantic or pragmatic constraints that float, appearing at a variety of different syntactic ranks, often merged with other semantic constraints. This means that multiple content units can be realized by a single surface element, and conversely, that a single content unit can be realized by a variety of surface elements. Our approach uses the Functional Unification Formalism (FUF) to represent a generation lexicon, allowing for declarative and compositional representation of individual constraints

Columbia University Academic Commons

Modeling lexical decision : the form of frequency and diversity effects

Author: Adeleman James S.
Brown G. D. A. (Gordon D. A.)
Publication venue: 'American Psychological Association (APA)'
Publication date
Field of study

What is the root cause of word frequency effects on lexical decision times? W. S. Murray and K. I. Forster (2004) argued that such effects are linear in rank frequency, consistent with a serial search model of lexical access. In this article, the authors (a) describe a method of testing models of such effects that takes into account the possibility of parametric overfitting; (b) illustrate the effect of corpus choice on estimates of rank frequency; (c) give derivations of nine functional forms as predictions of models of lexical decision; (d) detail the assessment of these models and the rank model against existing data regarding the functional form of frequency effects; and (e) report further assessments using contextual diversity, a factor confounded with word frequency. The relationship between the occurrence distribution of words and lexical decision latencies to those words does not appear compatible with the rank hypothesis, undermining the case for serial search models of lexical access. Three transformations of contextual diversity based on extensions of instance models do, however, remain as plausible explanations of the effect

Warwick Research Archives Portal Repository

Bootstrapping Lexical Choice via Multiple-Sequence Alignment

Author: Barzilay Regina
Lee Lillian
Publication venue
Publication date: 01/01/2002
Field of study

An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method leverages latent information contained in multi-parallel corpora -- datasets that supply several verbalizations of the corresponding semantics rather than just one. We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For example, in evaluations involving a dozen human judges, our system produced output whose readability and faithfulness to the semantic input rivaled that of a traditional generation system.Comment: 8 pages; to appear in the proceedings of EMNLP-200

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

Recommended from our members

Parsing with parallelism : a spreading-activation model of inference processing during text understanding

Author: Eiselt Kurt P.
Granger Richard H.
Holbrook Jennifer K.
Publication venue: eScholarship, University of California
Publication date: 01/01/1984
Field of study

The past decade of reseatch in Natural Language Processing has universally recognized that, since natural language input is almost always ambiguous with respect to its pragmatic implications, its syntactic parse, and even its lexical analysis (i.e., choice of correct word-sense for an ambiguous word), processing natural language input requires decisions about word meanings, syntactic structure, and pragmatic inferences. The lexical, syntactic, and pragmatic levels of inferencing are not as disparate as they have often been treated in both psychological and artificial intelligence research. In fact, these three levels of analysis interact to form a joint interpretation of text.ATLAST (A Three-level Language Analysis SysTem) is an implemented integration of human language understanding at the lexical, the syntactic, and the pragmatic levels. For psychological validity, ATLAST is based on results of experiments with human subjects. The ATLAST model uses a new architecture which was developed to incorporate three features: spreading activation memory, two-stage syntax, and parallel processing of syntax and semantics. It is also a new framework within which to interpret and tackle unsolved problems through implementation and experimentation

eScholarship - University of California

Lexical choice for complex noun phrases: Structure, modifiers, and determiners

Author: Elhadad Michael
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1996
Field of study

This paper presents a lexical choice component for complex noun phrases. We first explain why lexical choice for NPs deserves special attention within the standard pipeline architecture for a generator. The task of the lexical chooser for NPs is more complex than for clauses because the syntax of NPs is less understood than for clauses, and therefore, syntactic realization components, while they accept a predicate-argument structure as input for clauses, require a purely syntactic tree as input for NPs. The task of mapping conceptual relations to different syntactic modifiers is therefore left to the lexical chooser for NPs. The paper focuses on the syntagmatic aspect of lexical choice, identifying a process called “NP planning”. It focuses on a set of communicative goals that NPs can satisfy and specifies an interface between the different components of the generator and the lexical chooser. The technique presented for NP planning encapsulates a rich lexical knowledge and allows for the generation of a wide variety of syntactic constructions. It also allows for a large paraphrasing power because it dynamically maps conceptual information to various syntactic slots

CiteSeerX

Columbia University Academic Commons