523 research outputs found

    Introduction to the special issue on annotated corpora

    Get PDF
    International audienceLes corpus annoteĢs sont toujours plus cruciaux, aussi bien pour la recherche scien- tifique en linguistique que le traitement automatique des langues. Ce numeĢro speĢcial passe brieĢ€vement en revue lā€™eĢvolution du domaine et souligne les deĢfis aĢ€ relever en restant dans le cadre actuel dā€™annotations utilisant des cateĢgories analytiques, ainsi que ceux remettant en question le cadre lui-meĢ‚me. Il preĢsente trois articles, lā€™un concernant lā€™eĢvaluation de la qualiteĢ dā€™annotation, et deux concernant des corpus arboreĢs du francĢ§ais, lā€™un traitant du plus ancien projet de corpus arboreĢ du francĢ§ais, le French Treebank, le second concernant la conversion de corpus francĢ§ais dans le scheĢma interlingue des Universal Dependencies, offrant ainsi une illustration de lā€™histoire du deĢveloppement des corpus arboreĢs.Annotated corpora are increasingly important for linguistic scholarship, science and technology. This special issue briefly surveys the development of the field and points to challenges within the current framework of annotation using analytical categories as well as challenges to the framework itself. It presents three articles, one concerning the evaluation of the quality of annotation, and two concerning French treebanks, one dealing with the oldest project for French, the French Treebank, the second concerning the conversion of French corpora into the cross-lingual framework of Universal Dependencies, thus offering an illustration of the history of treebank development worldwide

    Style of translation: An exploration of stylistic patterns in the translations of Margaret Jull Costa and Peter Bush

    Get PDF
    The aim of this study is to identify and explore typical stylistic traits in the work of two translators, using a corpus-based, data-driven methodology. Following Halliday (1971), Leech and Short (1981) and Baker (2000), the translatorā€™s style is seen here as involving a consistent pattern of choices that distinguishes the work of one translator from that of others. In the present study such patterns emerge from a data-driven analysis of a purpose-built parallel corpus containing works of Spanish and Portuguese fiction and their translations into English by Margaret Jull Costa and Peter Bush. Comparative dataare drawn from COMPARA, a bi-directional parallel corpus of English and Portuguese narrative. The quantitative analysis shows that Margaret Jull Costa makes greater use of italics for emphasis than does Peter Bush, or than would be expected on the basis of norms for translations from Portuguese. Peter Bushā€™s translations, on the other hand, are characterized by a comparatively high use of source language words. The qualitative analysis focuses on the communicative function of emphatic italics and source language words in context, drawing on the Hallidayan (1967) notion of information focus, on Hermansā€™ (1996) treatment of self-referentiality and AixclĆ¢'s (1996) treatment of culture-specificity in translation. I argue that Margaret Jull Costa emphasises readability in her translations, which leads to a discussion of explicitation (Blum-Kulka 1986/2001, Klaudy and KĆ¢roly 2005, House 2004), and to a further study, modelled on Olohan and Baker (2000), that compares patterns of omission and inclusion of the connective 'that' after reporting verbs SAY and TELL. The findings are discussed in the light of the translators' backgrounds and ideologies, as evidenced from their writings on translation and from interviews carried out by the researcher. I conclude that one of the motivating factors behind the translators' strategies is how they see their role as translators in relation to their audiences

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure ā€“ CLARIN ā€“ for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Determinants of Mobility Disability in Older Adults: Evidence from Population-Based Epidemiologic Studies

    Get PDF
    Gait and mobility are cardinal to maintain autonomy and independency in daily life, also for older persons. Changes in these functions might be critical in the transition towards disability and loss of autonomy during the aging process. The aim of the present work, which collects three years of research conducted between Italy and the United States, was to assess some of the main risk factors for the progressive impairment of mobility and gait in older adults living in the community. According to our results, abnormalities in the nervous and cardiovascular systems, even subtle, are associated with a subsequent worsening of these functions. These data might help to better understand the progressive development of disability in the elderly, and in the future might also have practical implications for prevention

    The gastrointestinal tract:From healthy mucosa to colorectal cancer

    Get PDF

    The gastrointestinal tract:From healthy mucosa to colorectal cancer

    Get PDF

    Understanding the structure and meaning of Finnish texts: From corpus creation to deep language modelling

    Get PDF
    Natural Language Processing (NLP) is a cross-disciplinary field combining elements of computer science, artificial intelligence, and linguistics, with the objective of developing means for computational analysis, understanding or generation of human language. The primary aim of this thesis is to advance natural language processing in Finnish by providing more resources and investigating the most effective machine learning based practices for their use. The thesis focuses on NLP topics related to understanding the structure and meaning of written language, mainly concentrating on structural analysis (syntactic parsing) as well as exploring the semantic equivalence of statements that vary in their surface realization (paraphrase modelling). While the new resources presented in the thesis are developed for Finnish, most of the methodological contributions are language-agnostic, and the accompanying papers demonstrate the application and evaluation of these methods across multiple languages. The first set of contributions of this thesis revolve around the development of a state-of-the-art Finnish dependency parsing pipeline. Firstly, the necessary Finnish training data was converted to the Universal Dependencies scheme, integrating Finnish into this important treebank collection and establishing the foundations for Finnish UD parsing. Secondly, a novel word lemmatization method based on deep neural networks is introduced and assessed across a diverse set of over 50 languages. And finally, the overall dependency parsing pipeline is evaluated on a large number of languages, securing top ranks in two competitive shared tasks focused on multilingual dependency parsing. The overall outcome of this line of research is a parsing pipeline reaching state-of-the-art accuracy in Finnish dependency parsing, the parsing numbers obtained with the latest pre-trained language models approaching (at least near) human-level performance. The achievement of large language models in the area of dependency parsingā€” as well as in many other structured prediction tasksā€” brings up the hope of the large pre-trained language models genuinely comprehending language, rather than merely relying on simple surface cues. However, datasets designed to measure semantic comprehension in Finnish have been non-existent, or very scarce at the best. To address this limitation, and to reflect the general change of emphasis in the field towards task more semantic in nature, the second part of the thesis shifts its focus to language understanding through an exploration of paraphrase modelling. The second contribution of the thesis is the creation of a novel, large-scale, manually annotated corpus of Finnish paraphrases. A unique aspect of this corpus is that its examples have been manually extracted from two related text documents, with the objective of obtaining non-trivial paraphrase pairs valuable for training and evaluating various language understanding models on paraphrasing. We show that manual paraphrase extraction can yield a corpus featuring pairs that are both notably longer and less lexically overlapping than those produced through automated candidate selection, the current prevailing practice in paraphrase corpus construction. Another distinctive feature in the corpus is that the paraphrases are identified and distributed within their document context, allowing for richer modelling and novel tasks to be defined

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure ā€“ CLARIN ā€“ for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
    • ā€¦
    corecore