Search CORE

26,279 research outputs found

Discovery of Linguistic Relations Using Lexical Attraction

Author: Yuret Deniz
Publication venue
Publication date: 01/01/1998
Field of study

This work has been motivated by two long term goals: to understand how humans learn language and to build programs that can understand language. Using a representation that makes the relevant features explicit is a prerequisite for successful learning and understanding. Therefore, I chose to represent relations between individual words explicitly in my model. Lexical attraction is defined as the likelihood of such relations. I introduce a new class of probabilistic language models named lexical attraction models which can represent long distance relations between words and I formalize this new class of models using information theory. Within the framework of lexical attraction, I developed an unsupervised language acquisition program that learns to identify linguistic relations in a given sentence. The only explicitly represented linguistic knowledge in the program is lexical attraction. There is no initial grammar or lexicon built in and the only input is raw text. Learning and processing are interdigitated. The processor uses the regularities detected by the learner to impose structure on the input. This structure enables the learner to detect higher level regularities. Using this bootstrapping procedure, the program was trained on 100 million words of Associated Press material and was able to achieve 60% precision and 50% recall in finding relations between content-words. Using knowledge of lexical attraction, the program can identify the correct relations in syntactically ambiguous sentences such as ``I saw the Statue of Liberty flying over New York.''Comment: dissertation, 56 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Lexical typology : a programmatic sketch

Author: Behrens Leila
Sasse Hans-Jürgen
Publication venue
Publication date: 01/01/1997
Field of study

The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

Hochschulschriftenserver - Universität Frankfurt am Main

Confounds and Consequences in Geotagged Twitter Data

Author: Eisenstein Jacob
Pavalanathan Umashanthi
Publication venue
Publication date: 01/01/2015
Field of study

Twitter is often used in quantitative studies that identify geographically-preferred topics, writing styles, and entities. These studies rely on either GPS coordinates attached to individual messages, or on the user-supplied location field in each profile. In this paper, we compare these data acquisition techniques and quantify the biases that they introduce; we also measure their effects on linguistic analysis and text-based geolocation. GPS-tagging and self-reported locations yield measurably different corpora, and these linguistic differences are partially attributable to differences in dataset composition by age and gender. Using a latent variable model to induce age and gender, we show how these demographic variables interact with geography to affect language use. We also show that the accuracy of text-based geolocation varies with population demographics, giving the best results for men above the age of 40.Comment: final version for EMNLP 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Re-discovery procedures and the lexicon

Author: Lipka Leonhard
Publication venue
Publication date: 01/01/1975
Field of study

Open Access LMU

The Unsupervised Acquisition of a Lexicon from Continuous Speech

Author: de Marcken Carl
Publication venue
Publication date: 01/01/1995
Field of study

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Subject-object asymmetry in the acquisition of the definite article in Modern Greek

Author: Marinis Theodore
Publication venue: Essex Research Reports in Linguistics
Publication date: 01/01/2002
Field of study

University of Essex Research Repository

UCL Discovery

Recommended from our members

Digital Creativity Support for Original Journalism

Author: Apostolou D.
Brown A.
Holm B.
Maiden N.
Nyre L.
Tonheim A.
van der Beld A.
Zachos K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2020
Field of study

The decline in circulations and revenues resulting from the digitalization of news production and consumption has led to a crisis in journalism.Journalists have less time to research, investigate and write original stories, leading to problems for our democratic processes and holding the powerful to account. This paper reports the architecture, features and rationale for new digital creativity support designed to support journalists to discover more original angles onstories. It also summarises the evaluation of the tool’s use in 3 newsrooms

City Research Online