2,743 research outputs found

    Testing the Limits of Anaphoric Distance in Classical Arabic: a Corpus-Based Study

    Get PDF
    One of the central aims in research on anaphora is to discover the factors that determine the choice of referential expressions in discourse. Ariel (1988; 2001) offers an Accessibility Scale where referential expressions, including demonstratives, are categorized according to the values of anaphoric (i.e. textual) distance that each of these has in relation to its antecedent. The aim of this paper is to test Arielā€™s (1988; 1990; 2001) claim that the choice to use proximal or distal anaphors is mainly determined by anaphoric distance. This claim is investigated in relation to singular demonstratives in a corpus of Classical Arabic (CA) prose texts by using word count to measure anaphoric distance. Results indicate that anaphoric distance cannot be taken as a consistent or reliable determinant of how anaphors are used in CA, and so Arielā€™s claim is not supported by the results of this study. This also indicates that the universality of anaphoric distance, as a criterion of accessibility, is defied

    On the relation between linguistic typology and (limitations of) multilingual language modeling

    Get PDF
    A key challenge in cross-lingual NLP is developing general language-independent architectures that are equally applicable to any language. However, this ambition is largely hampered by the variation in structural and semantic properties, i.e. the typological profiles of the world's languages. In this work, we analyse the implications of this variation on the language modeling (LM) task. We present a large-scale study of state-of-the art n-gram based and neural language models on 50 typologically diverse languages covering a wide variety of morphological systems. Operating in the full vocabulary LM setup focused on word-level prediction, we demonstrate that a coarse typology of morphological systems is predictive of absolute LM performance. Moreover, fine-grained typological features such as exponence, flexivity, fusion, and inflectional synthesis are borne out to be responsible for the proliferation of low-frequency phenomena which are organically difficult to model by statistical architectures, or for the meaning ambiguity of character n-grams. Our study strongly suggests that these features have to be taken into consideration during the construction of next-level language-agnostic LM architectures, capable of handling morphologically complex languages such as Tamil or Korean.ERC grant Lexica

    Proceedings

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Visibility and Availability of LT Resources. Editors: Sjur NĆørstebĆø Moshagen and Per LanggĆ„rd. NEALT Proceedings Series, Vol. 13 (2011), vi+32 pp. Ā© 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/1697

    Linguistic diversity and complexity

    Get PDF
    Peer reviewe

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007

    Get PDF
    This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p
    • ā€¦
    corecore