Search CORE

4,034 research outputs found

Syntactic Topic Models

Author: Blei David M.
Boyd-Graber Jordan
Publication venue
Publication date: 01/01/2008
Field of study

The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We report qualitative and quantitative studies on both synthetic data and hand-parsed documents. We show that the STM is a more predictive model of language than current models based only on syntax or only on topics

arXiv.org e-Print Archive

CiteSeerX

Redefining part-of-speech classes with distributional semantic models

Author: Kutuzov Andrey
Velldal Erik
Øvrelid Lilja
Publication venue
Publication date: 01/01/2016
Field of study

This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries. Our work targets the Universal PoS tag set, which is currently actively being used for annotation of a range of languages. We experiment with training classifiers for predicting PoS tags for words based on their embeddings. The results show that the information about PoS affiliation contained in the distributional vectors allows us to discover groups of words with distributional patterns that differ from other words of the same part of speech. This data often reveals hidden inconsistencies of the annotation process or guidelines. At the same time, it supports the notion of `soft' or `graded' part of speech affiliations. Finally, we show that information about PoS is distributed among dozens of vector components, not limited to only one or two features

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives

Secondary predication in Russian

Author: Demjjanow Assinja
Strigin Anatoli
Publication venue
Publication date: 01/01/2001
Field of study

The paper makes two contributions to semantic typology of secondary predicates. It provides an explanation of the fact that Russian has no resultative secondary predicates, relating this explanation to the interpretation of secondary predicates in English. And it relates depictive secondary predicates in Russian, which usually occur in the instrumental case, to other uses of the instrumental case in Russian, establishing here, too, a difference to English concerning the scope of the secondary predication phenomenon

Hochschulschriftenserver - Universität Frankfurt am Main