3,218 research outputs found

    International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora

    Get PDF
    This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.Peer reviewe

    Languages, nations and identities

    Get PDF
    This article reviews a range of ways in which issues of national identity have been shown to be linked with the topic of language. We suggest that there is scope for development both of the theoretical underpinning to claims made about the nature of these links, and also, in consequence, to the methodological approaches appropriate to empirical investigations of them. Here, we explore the ways in which aspects of the social world such as those summarised above are understood theoretically. The first part of the paper argues that debates about the relationship of languages to forms of social identity, particularly those associated with nationalisms, often make a number of assumptions - about languages, about collectivities and about social agency. The second part interrogates these assumptions and proposes the utility of realist theory in evaluating claims in this area. In the final part of the paper, we outline the methodological implications of our argument

    Crossings as a side effect of dependency lengths

    Get PDF
    The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, i.e. sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in Complexity (Wiley

    Cryomorphological topographies in the study of ice caves

    Get PDF
    Producción CientíficaThe current interest in ice caves requires that their varied manifestations be known as accurately as possible in view of their responses to a global change and also to their great potential as paleoenvironmental witnesses. This phenomenon has been known about for a long time but is still scarcely studied from the point of view of its cryological values and the evolution and distribution of many of their morphologies. For this, the development of cryomorphological topographies from traditional techniques to geodetic surveys with different tools, including terrestrial laser scanning, is one of the most current ways to characterize and quantify this type of cryospheric phenomena. It represents a new kind of periglacial cartography whose use is feasible in spite of the difficulties these environments present.Ministerio de Economía, Industria y Competitividad (project CGL2015-68144-R
    corecore