64 research outputs found
Probabilistic Topic Modeling of the Russian Text Corpus on Musicology
The paper describes the results of experiments on the development of a statistical model of the Russian text corpus on musicology. We construct a topic model based on Latent Dirichlet Allocation and process corpus data with the help of the GenSim statistical toolkit. Results achieved in course of experiments allow us to distinguish general and special topics which describe conceptual structure of the corpus in question and to analyze paradigmatic and syntagmatic relations between lemmata within topics.The research discussed in the paper is supported by the grant of St.-Petersburg State University № 30.38.305.2014 «Quantitative linguistic parameters for defining stylistic characteristics and subject area of texts»
A paradox of syntactic priming: why response tendencies show priming for passives, and response latencies show priming for actives
Speakers tend to repeat syntactic structures across sentences, a phenomenon called syntactic priming. Although it has been suggested that repeating syntactic structures should result in speeded responses, previous research has focused on effects in response tendencies. We investigated syntactic priming effects simultaneously in response tendencies and response latencies for active and passive transitive sentences in a picture description task. In Experiment 1, there were priming effects in response tendencies for passives and in response latencies for actives. However, when participants' pre-existing preference for actives was altered in Experiment 2, syntactic priming occurred for both actives and passives in response tendencies as well as in response latencies. This is the first investigation of the effects of structure frequency on both response tendencies and latencies in syntactic priming. We discuss the implications of these data for current theories of syntactic processing
Analyzing co-occurrence data
This is the author accepted manuscript. The final version is available from Springer via the DOI in this recordIn this chapter, we provide an overview of quantitative as well as qualitative approaches to co-occurrence data. We begin with a brief terminological overview of different types of co-occurrence that are prominent in corpus-linguistic studies and then discuss the computation of some widely-used measures of association used to quantify co-occurrence. We present two representative case studies, one exploring lexical collocation and learner proficiency, the other creative uses of verbs in/with argument structure constructions. In addition, we highlight how most widely-used measures actually all fall out from viewing corpus-linguistic association as an instance of regression modeling and discuss newer developments and potential improvements of association measure research
Multi-word expressions: A novel computational approach to their bottom-up statistical extraction
Item does not contain fulltextIn this paper, we introduce and validate a new bottom-up approach to the identification/extraction of multi-word expressions in corpora. This approach, called Multi-word Expressions from the Recursive Grouping of Elements (MERGE), is based on the successive combination of bigrams to form word sequences of various lengths. The selection of bigrams to be "merged" is based on the use of a lexical association measure, log likelihood (Dunning, Computational Linguistics 19:61-74, 1993). We apply the algorithm to two corpora and test its performance both on its own merits and against a competing algorithm from the literature, the adjusted frequency list (O’Donnell, ICAME Journal 35:135-169, 2011). Performance of the algorithms is evaluated via human ratings of the multi-word expression candidates that they generate. Ultimately, MERGE is shown to offer a very competitive approach to MWE extraction
- …