635 research outputs found

    Quantifying the morphosyntactic content of Brown Clusters

    Get PDF

    Borders and boundaries in Bosnian, Croatian, Montenegrin and Serbian: Twitter data to the rescue

    Get PDF
    partially_open3siIn this paper we deal with the spatial distribution of 16 linguistic features known to vary between Bosnian, Croatian, Montenegrin, and Serbian. We perform our analyses on a dataset of geo-encoded Twitter status messages collected in the period from mid-2013 to the end of 2016. We perform two types of analyses. The first one finds boundaries in the spatial distribution of the linguistic variable levels through the kernel density estimation smoothing technique. These boundaries are then plotted over the state borders for a visual comparison. The second analysis deals with linguistic distance between the states. The groupings of linguistic variables and countries are calculated given the state borders and the Jensen-Shannon divergence between distributions of the 16 variables within each state. This analysis is completed with a measure of variable consistency for each country. These analyses are intended to show the extent to which current state borders correspond to linguistic boundaries. They suggest that Croatia and Serbia still represent the two extremes, reflecting a history of normative divergences, while Bosnia-Herzegovina and Montenegro, depending on the variable, lean to one or the other side.openNikola Ljubešić; Maja Miličević Petrović; Tanja SamardžićNikola Ljubešić; Maja Miličević Petrović; Tanja Samardži

    Lingualyzer: A computational linguistic tool for multilingual and multidimensional text analysis

    Full text link
    Most natural language models and tools are restricted to one language, typically English. For researchers in the behavioral sciences investigating languages other than English, and for those researchers who would like to make cross-linguistic comparisons, hardly any computational linguistic tools exist, particularly none for those researchers who lack deep computational linguistic knowledge or programming skills. Yet, for interdisciplinary researchers in a variety of fields, ranging from psycholinguistics, social psychology, cognitive psychology, education, to literary studies, there certainly is a need for such a cross-linguistic tool. In the current paper, we present Lingualyzer (https://lingualyzer.com), an easily accessible tool that analyzes text at three different text levels (sentence, paragraph, document), which includes 351 multidimensional linguistic measures that are available in 41 different languages. This paper gives an overview of Lingualyzer, categorizes its hundreds of measures, demonstrates how it distinguishes itself from other text quantification tools, explains how it can be used, and provides validations. Lingualyzer is freely accessible for scientific purposes using an intuitive and easy-to-use interface

    Electrophysiological methods

    No full text

    Does child-directed speech facilitate language development in all domains? A study space analysis of the existing evidence

    Get PDF
    Because child-directed speech (CDS) is ubiquitous in some cultures and because positive associations between certain features of the language input and certain learning outcomes have been attested it has often been claimed that the function of CDS is to aid children’s language development in general. We argue that for this claim to be generalisable, superior learning from CDS compared to non-CDS, such as adult-directed speech (ADS), must be demonstrated across multiple input domains and learning outcomes. To determine the availability of such evidence we performed a study space analysis of the research literature on CDS. A total of 942 relevant papers were coded with respect to (i) CDS features under consideration, (ii) learning outcomes and (iii) whether a comparison between CDS and ADS was reported. The results show that only 16.2% of peer-reviewed studies in this field compared learning outcomes between CDS and ADS, almost half of which focussed on the ability to discriminate between the two registers. Crucially, we found only 20 studies comparing learning outcomes between CDS and ADS for morphosyntactic and lexico-semantic features and none for pragmatic and extra-linguistic features. Although these 20 studies provided preliminary evidence for a facilitative effect of some specific morphosyntactic and lexico-semantic features, overall CDS-ADS comparison studies are very unevenly distributed across the space of CDS features and outcome measures. The disproportional emphasis on prosodic, phonetic, and phonological input features, and register discrimination as the outcome invites caution with respect to the generalisability of the claim that CDS facilitates language development across the breadth of input domains and learning outcomes. Future research ought to resolve the discrepancy between sweeping claims about the function of CDS as facilitating language development on the one hand and the narrow evidence base for such a claim on the other by conducting CDS-ADS comparisons across a wider range of input features and outcome measures

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

    Learning Functional Prepositions

    Full text link
    In first language acquisition, what does it mean for a grammatical category to have been acquired, and what are the mechanisms by which children learn functional categories in general? In the context of prepositions (Ps), if the lexical/functional divide cuts through the P category, as has been suggested in the theoretical literature, then constructivist accounts of language acquisition would predict that children develop adult-like competence with the more abstract units, functional Ps, at a slower rate compared to their acquisition of lexical Ps. Nativists instead assume that the features of functional P are made available by Universal Grammar (UG), and are mapped as quickly, if not faster, than the semantic features of their lexical counterparts. Conversely, if Ps are either all lexical or all functional, on both accounts of acquisition we should observe few differences in learning. Three empirical studies of the development of P were conducted via computer analysis of the English and Spanish sub-corpora of the CHILDES database. Study 1 analyzed errors in child usage of Ps, finding almost no errors in commission in either language, but that the English learners lag in their production of functional Ps relative to lexical Ps. That no such delay was found in the Spanish data suggests that the English pattern is not universal. Studies 2 and 3 applied novel measures of phrasal (P head + nominal complement) productivity to the data. Study 2 examined prepositional phrases (PPs) whose head-complement pairs appeared in both child and adult speech, while Study 3 considered PPs produced by children that never occurred in adult speech. In both studies the productivity of Ps for English children developed faster than that of lexical Ps. In Spanish there were few differences, suggesting that children had already mastered both orders of Ps early in acquisition. These empirical results suggest that at least in English P is indeed a split category, and that children acquire the syntax of the functional subset very quickly, committing almost no errors. The UG position is thus supported. Next, the dissertation investigates a \u27soft nativist\u27 acquisition strategy that composes the distributional analysis of input, minimal a priori knowledge of the possible co-occurrence of morphosyntactic features associated with functional elements, and linguistic knowledge that is presumably acquired via the experience of pragmatic, communicative situations. The output of the analysis consists in a mapping of morphemes to the feature bundles of nominative pronouns for English and Spanish, plus specific claims about the sort of knowledge required from experience. The acquisition model is then extended to adpositions, to examine what, if anything, distributional analysis can tell us about the functional sequences of PPs. The results confirm the theoretical position according to which spatiotemporal Ps are lexical in character, rooting their own extended projections, and that functional Ps express an aspectual sequence in the functional superstructure of the PP

    Measuring and assessing indeterminacy and variation in the morphology-syntax distinction (advance online)

    Get PDF
    We provide a discussion of some of the challenges in using statistical methods to investigate the morphology-syntax distinction cross-linguistically. The paper is structured around three problems related to the morphology-syntax distinction: (i) the boundary strength problem; (ii) the composition problem; (iii) the architectural problem. The boundary strength problem refers to the possibility that languages vary in terms of how distinct morphology and syntax are or the degree to which morphology is autonomous. The composition problem refers to the possibility that languages vary in terms of how they distinguish morphology and syntax: what types of properties distinguish the two systems. The architecture problem refers to the possibility that languages vary in terms of whether a global distinction between morphology and syntax is motivated at all and the possibility that languages might partition phenomena in different ways. This paper is concerned with providing an overarching review of the methodological problems involved in addressing these three issues. We illustrate the problems using three statistical methods: correlation matrices, random forests with different choices for the dependent variable, and hierarchical clustering with validation techniques
    corecore