Search CORE

1,424 research outputs found

Discovery of Linguistic Relations Using Lexical Attraction

Author: Yuret Deniz
Publication venue
Publication date: 01/01/1998
Field of study

This work has been motivated by two long term goals: to understand how humans learn language and to build programs that can understand language. Using a representation that makes the relevant features explicit is a prerequisite for successful learning and understanding. Therefore, I chose to represent relations between individual words explicitly in my model. Lexical attraction is defined as the likelihood of such relations. I introduce a new class of probabilistic language models named lexical attraction models which can represent long distance relations between words and I formalize this new class of models using information theory. Within the framework of lexical attraction, I developed an unsupervised language acquisition program that learns to identify linguistic relations in a given sentence. The only explicitly represented linguistic knowledge in the program is lexical attraction. There is no initial grammar or lexicon built in and the only input is raw text. Learning and processing are interdigitated. The processor uses the regularities detected by the learner to impose structure on the input. This structure enables the learner to detect higher level regularities. Using this bootstrapping procedure, the program was trained on 100 million words of Associated Press material and was able to achieve 60% precision and 50% recall in finding relations between content-words. Using knowledge of lexical attraction, the program can identify the correct relations in syntactically ambiguous sentences such as ``I saw the Statue of Liberty flying over New York.''Comment: dissertation, 56 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Lexically specific knowledge and individual differences in adult native speakers’ processing of the English passive

Author: Dabrowska Ewa
Street James
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2014
Field of study

This article provides experimental evidence for the role of lexically specific representations in the processing of passive sentences and considerable education-related differences in comprehension of the passive construction. The experiment measured response time and decision accuracy of participants with high and low academic attainment using an online task that compared processing and comprehension of active and passive sentences containing verbs strongly associated with the passive and active constructions, as determined by collostructional analysis. As predicted by usage-based accounts, participants’ performance was influenced by frequency (both groups processed actives faster than passives; the low academic attainment participants also made significantly more errors on passive sentences) and lexical specificity (i.e., processing of passives was slower with verbs strongly associated with the active). Contra to proposals made by Dąbrowska and Street (2006), the results suggest that all participants have verb-specific as well as verb-general representations, but that the latter are not as entrenched in the participants with low academic attainment, resulting in less reliable performance. The results also show no evidence of a speed–accuracy trade-off, making alternative accounts of the results (e.g., those of two-stage processing models, such as Townsend & Bever, 2001) problematic

Northumbria University Research Portal

Statistical keyword detection in literary corpora

Author: Cancho
Cassandro
Cohen
Ebeling
Ebeling
Grosse
J. P. Herrera
Luhn
Mantegna
Montemurro
Ortuño
P. A. Pury
Stanley
Yang
Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/05/2008
Field of study

Understanding the complexity of human language requires an appropriate analysis of the statistical distribution of words in texts. We consider the information retrieval problem of detecting and ranking the relevant words of a text by means of statistical information referring to the "spatial" use of the words. Shannon's entropy of information is used as a tool for automatic keyword extraction. By using The Origin of Species by Charles Darwin as a representative text sample, we show the performance of our detector and compare it with another proposals in the literature. The random shuffled text receives special attention as a tool for calibrating the ranking indices.Comment: Published version. 11 pages, 7 figures. SVJour for LaTeX2

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Formulaicity in an agglutinating language: the case of Turkish

Author: Durrant Philip
Publication venue: De Gruyter
Publication date: 01/01/2013
Field of study

publication-status: Acceptedtypes: ArticleThis published version of the article replaces the accepted version which is available in ORE at: http://hdl.handle.net/10871/9615This study examines the extent to which complex inflectional patterns found in Turkish, a language with a rich agglutinating morphology, can be described as formulaic. It is found that many prototypically formulaic phenomena previously attested at the multi-word level in English – frequent co-occurrence of specific elements, fixed ‘bundles’ of elements, and associations between lexis and grammar – also play an important role at the morphological level in Turkish. It is argued that current psycholinguistic models of agglutinative morphology need to be complexified to incorporate such patterns. Conclusions are also drawn for the practice of Turkish as a Foreign Language teaching and for the methodology of Turkish corpus linguistics

Crossref

Bilkent University Institutional Repository

Open Research Exeter

Recommended from our members

Alternative Complementation in Partially Schematic Constructions: a Quantitative Corpus-based Examination of COME to V2 and GET to V2

Author: Lester Nicholas A.
Publication venue: 'University of North Texas Libraries'
Publication date: 01/05/2012
Field of study

This paper examines two English polyverbal constructions, COME to V2 and GET to V2, as exemplified in Examples 1 and 2, respectively. (1) the senator came to know thousands of his constituents (2) Little Johnny got to eat ice cream after every little league game. Previous studies considered these types of constructions (though come and get as used here have not been sufficiently studied) as belonging to a special class of complement constructions, in which the infinitive is regarded as instantiating a separate, subordinate predication from that of the “matrix” or leftward finite verb. These constructions, however, exhibit systematic deviation from the various criteria proposed in previous research. This study uses the American National Corpus to investigate the statistical propensities of the target phenomena via lexico-syntactic (collostructional analysis) and morpho-syntactic (binary logistic regression) features, as captured through the lens of construction grammar

UNT Digital Library

Eight Dimensions for the Emotions

Author: Cochrane Tom
Publication venue
Publication date: 01/01/2009
Field of study

The author proposes a dimensional model of our emotion concepts that is intended to be largely independent of one’s theory of emotions and applicable to the different ways in which emotions are measured. He outlines some conditions for selecting the dimensions based on these motivations and general conceptual grounds. Given these conditions he then advances an 8-dimensional model that is shown to effectively differentiate emotion labels both within and across cultures, as well as more obscure expressive language. The 8 dimensions are: (1) attracted—repulsed, (2) powerful—weak, (3) free—constrained, (4) certain—uncertain, (5) generalized—focused, (6) future directed—past directed, (7) enduring—sudden, (8) socially connected—disconnected

PhilPapers

Text Segmentation Using Exponential Models

Author: Beeferman Doug
Berger Adam
Lafferty John
Publication venue
Publication date: 01/01/1997
Field of study

This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large corpus of annotated data. We also propose a new probabilistically motivated error metric for use by the natural language processing and information retrieval communities, intended to supersede precision and recall for appraising segmentation algorithms. Qualitative assessment of our algorithm as well as evaluation using this new metric demonstrate the effectiveness of our approach in two very different domains, Wall Street Journal articles and the TDT Corpus, a collection of newswire articles and broadcast news transcripts.Comment: 12 pages, LaTeX source and postscript figures for EMNLP-2 pape

arXiv.org e-Print Archive

CiteSeerX