1,424 research outputs found
Discovery of Linguistic Relations Using Lexical Attraction
This work has been motivated by two long term goals: to understand how humans
learn language and to build programs that can understand language. Using a
representation that makes the relevant features explicit is a prerequisite for
successful learning and understanding. Therefore, I chose to represent
relations between individual words explicitly in my model. Lexical attraction
is defined as the likelihood of such relations. I introduce a new class of
probabilistic language models named lexical attraction models which can
represent long distance relations between words and I formalize this new class
of models using information theory.
Within the framework of lexical attraction, I developed an unsupervised
language acquisition program that learns to identify linguistic relations in a
given sentence. The only explicitly represented linguistic knowledge in the
program is lexical attraction. There is no initial grammar or lexicon built in
and the only input is raw text. Learning and processing are interdigitated. The
processor uses the regularities detected by the learner to impose structure on
the input. This structure enables the learner to detect higher level
regularities. Using this bootstrapping procedure, the program was trained on
100 million words of Associated Press material and was able to achieve 60%
precision and 50% recall in finding relations between content-words. Using
knowledge of lexical attraction, the program can identify the correct relations
in syntactically ambiguous sentences such as ``I saw the Statue of Liberty
flying over New York.''Comment: dissertation, 56 page
Lexically specific knowledge and individual differences in adult native speakersâ processing of the English passive
This article provides experimental evidence for the role of lexically specific representations in the processing of passive sentences and considerable education-related differences in comprehension of the passive construction. The experiment measured response time and decision accuracy of participants with high and low academic attainment using an online task that compared processing and comprehension of active and passive sentences containing verbs strongly associated with the passive and active constructions, as determined by collostructional analysis. As predicted by usage-based accounts, participantsâ performance was influenced by frequency (both groups processed actives faster than passives; the low academic attainment participants also made significantly more errors on passive sentences) and lexical specificity (i.e., processing of passives was slower with verbs strongly associated with the active). Contra to proposals made by DÄ
browska and Street (2006), the results suggest that all participants have verb-specific as well as verb-general representations, but that the latter are not as entrenched in the participants with low academic attainment, resulting in less reliable performance. The results also show no evidence of a speedâaccuracy trade-off, making alternative accounts of the results (e.g., those of two-stage processing models, such as Townsend & Bever, 2001) problematic
Statistical keyword detection in literary corpora
Understanding the complexity of human language requires an appropriate
analysis of the statistical distribution of words in texts. We consider the
information retrieval problem of detecting and ranking the relevant words of a
text by means of statistical information referring to the "spatial" use of the
words. Shannon's entropy of information is used as a tool for automatic keyword
extraction. By using The Origin of Species by Charles Darwin as a
representative text sample, we show the performance of our detector and compare
it with another proposals in the literature. The random shuffled text receives
special attention as a tool for calibrating the ranking indices.Comment: Published version. 11 pages, 7 figures. SVJour for LaTeX2
Formulaicity in an agglutinating language: the case of Turkish
publication-status: Acceptedtypes: ArticleThis published version of the article replaces the accepted version which is available in ORE at: http://hdl.handle.net/10871/9615This study examines the extent to which complex inflectional patterns found in Turkish, a language with a rich agglutinating morphology, can be described as formulaic. It is found that many prototypically formulaic phenomena previously attested at the multi-word level in English â frequent co-occurrence of specific elements, fixed âbundlesâ of elements, and associations between lexis and grammar â also play an important role at the morphological level in Turkish. It is argued that current psycholinguistic models of agglutinative morphology need to be complexified to incorporate such patterns. Conclusions are also drawn for the practice of Turkish as a Foreign Language teaching and for the methodology of Turkish corpus linguistics
Recommended from our members
Alternative Complementation in Partially Schematic Constructions: a Quantitative Corpus-based Examination of COME to V2 and GET to V2
This paper examines two English polyverbal constructions, COME to V2 and GET to V2, as exemplified in Examples 1 and 2, respectively. (1) the senator came to know thousands of his constituents (2) Little Johnny got to eat ice cream after every little league game. Previous studies considered these types of constructions (though come and get as used here have not been sufficiently studied) as belonging to a special class of complement constructions, in which the infinitive is regarded as instantiating a separate, subordinate predication from that of the âmatrixâ or leftward finite verb. These constructions, however, exhibit systematic deviation from the various criteria proposed in previous research. This study uses the American National Corpus to investigate the statistical propensities of the target phenomena via lexico-syntactic (collostructional analysis) and morpho-syntactic (binary logistic regression) features, as captured through the lens of construction grammar
Eight Dimensions for the Emotions
The author proposes a dimensional model of our emotion concepts that is intended to be largely independent of oneâs theory of emotions and applicable to the different ways in which emotions are measured. He outlines some conditions for selecting the dimensions based on these motivations and general conceptual grounds. Given these conditions he then advances an 8-dimensional model that is shown to effectively differentiate emotion labels both within and across cultures, as well as more obscure expressive language. The 8 dimensions are: (1) attractedârepulsed, (2) powerfulâweak, (3) freeâconstrained, (4) certainâuncertain, (5) generalizedâfocused, (6) future directedâpast directed, (7) enduringâsudden, (8) socially connectedâdisconnected
Text Segmentation Using Exponential Models
This paper introduces a new statistical approach to partitioning text
automatically into coherent segments. Our approach enlists both short-range and
long-range language models to help it sniff out likely sites of topic changes
in text. To aid its search, the system consults a set of simple lexical hints
it has learned to associate with the presence of boundaries through inspection
of a large corpus of annotated data. We also propose a new probabilistically
motivated error metric for use by the natural language processing and
information retrieval communities, intended to supersede precision and recall
for appraising segmentation algorithms. Qualitative assessment of our algorithm
as well as evaluation using this new metric demonstrate the effectiveness of
our approach in two very different domains, Wall Street Journal articles and
the TDT Corpus, a collection of newswire articles and broadcast news
transcripts.Comment: 12 pages, LaTeX source and postscript figures for EMNLP-2 pape
- âŠ