710 research outputs found

    EXMARaLDA - Creating, Analysing and Sharing Spoken Language Corpora for Pragmatic Research

    Get PDF
    This paper presents EXMARaLDA, a system for the computer-assisted creation and analysis of spoken language corpora. The first part contains some general observations about technological and methodological requirements for doing corpus-based pragmatics. The second part explains the systems architecture and gives an overview of its most important software components a transcription editor, a corpus management tool and a corpus query tool. The last part presents some corpora which have been or are currently being compiled with the help of EXMARaLDA

    A simple architecture for the fine-grained documentation of endangered languages: the LACITO multimedia archive

    Get PDF
    A paraître dans : Proceedings of Oriental-COCOSDA 2011. Présenté à : 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA 2011), 2011-10-26 -> 2011-10-28, TaiwanInternational audienceThe LACITO multimedia archive provides free access to documents of connected, spontaneous speech, mostly in "rare" or endangered languages, recorded in their cultural context and transcribed in consultation with native speakers. Its goal is to contribute to the documentation and study of a precious human heritage: the world's languages. It has a special strength in languages of Asia and the Pacific. The LACITO archive was built with little personnel and less funding. It has been devised, developed and maintained over two decades by two researchers assisted by one engineer. Its simple architecture is based on current standards: Unicode character coding and XML markup; and Dublin Core/Open Language Archives Community recommendations for metadata. The data can be consulted online with any standard browser. The technical simplicity of the tools developed at LACITO makes them suitable for the creation of similar databases at other institutions. (For instance, tools from this archive were successfully adapted in the creation of the Formosan Languages archive.

    Applying Domain Knowledge from Structured Citation Formats to Text and Data Mining: Examples Using the CITE Architecture

    Get PDF
    Domain knowledge expressed in structured citation formats can be exploited in data mining. We propose four structural properties of canonically cited texts, then look at to two classic problems in the study of the scholia, or ancient scholarly commentary, found in the manuscripts of the Iliad. We cluster citations of scholia to analyze their distribution in different manuscripts; this leads to a revised view of how the manuscripts\u27 scribes drew on their source material. Correlated frequencies of named entities suggest that one group of manuscripts had access to material more closely based on the work of the greatest Hellenistic editor of Homer, Aristarchus of Samothrace

    Putting the Text back into Context: A Codicological Approach to Manuscript Transcription

    Get PDF
    Textual scholars have tended to produce editions which present the text without its manuscript context. Even though digital editions now often present single-witness editions with facsimiles of the manuscripts, nevertheless the text itself is still transcribed and represented as a linguistic object rather than a physical one. Indeed, this is explicitly stated as the theoretical basis for the de facto standard of markup for digital texts: the Guidelines of the Text Encoding Initiative (TEI). These explicitly treat texts as semantic units such as paragraphs, sentences, verses and so on, rather than physical elements such as pages, openings, or surfaces, and some scholars have argued that this is the only viable model for representing texts. In contrast, this chapter presents arguments for considering the document as a physical object in the markup of texts. The theoretical arguments of what constitutes a text are first reviewed, with emphasis on those used by the TEI and other theoreticians of digital markup. A series of cases is then given in which a document-centric approach may be desirable, with both modern and medieval examples. Finally a step forward in this direction is raised, namely the results of the Genetic Edition Working Group in the Manuscript Special Interest Group of the TEI: this includes a proposed standard for documentary markup, whereby aspects of codicology and mise en page can be included in digital editions, putting the text back into its manuscript context

    Англійська мова для навчання і роботи Т. 4. Професійне іншомовне письмо

    Get PDF
    Подано всі види діяльності студентів з вивчення англійської мови, спрямовані на розвиток мовної поведінки, необхідної для ефективного спілкування в академічному та професійному середовищах. Містить завдання і вправи, типові для різноманітних академічних та професійних сфер і ситуацій. Структура організації змісту – модульна, охоплює певні мовленнєві вміння залежно від мовної поведінки. Даний модуль має на меті розвиток у студентів умінь і навичок писемного спілкування, що пов’язане з майбутньою професією студентів, та основ медіації і письмового перекладу, які спрямовані на розвиток умінь писати тексти різних типів і жанрів, такі як резюме, листи, анотації тощо. Ресурси для самостійної роботи (частина ІІ) містять завдання та вправи для розвитку словникового запасу та розширення діапазону функціональних зразків, необхідних для виконання певних функцій, та завдання, які спрямовані на організацію самостійної роботи студентів. За допомогою засобів діагностики (частина ІІІ) студенти можуть самостійно перевірити засвоєння навчального матеріалу та оцінити свої досягнення. Граматичні явища і вправи для їх засвоєння наводяться в томі 5. Призначений для студентів технічних університетів гірничого профілю. Може використовуватися для викладання вибіркових курсів з англійської мови, а також для самостійного вивчення англійської мови викладачами, фахівцями і науковцями різних інженерних галузей

    Some thoughts on the papyrological edition

    Get PDF
    The papyrological edition has not stopped evolving since the publication of the first papyrus in the 18th century. And, despite the fixing of editorial norms with the Leiden system (1931), it continues to change. It improves at the same time as it adapts to the ever-changing requirements of the historical and philological sciences but also of the academic context. In this sense, it is a reflection of science and its organization. It is therefore not illegitimate to wonder whether certain developments are beneficial and whether certain adjustments would not be profitable

    Investigating Multilingual, Multi-script Support in Lucene/Solr Library Applications

    Get PDF
    Yale has developed over many years a highly-structured, high-quality multilingual catalog of bibliographic data. Almost 50% of the collection represents non-English materials in over 650 languages, and includes many different non-Roman scripts. Faculty, students, researchers, and staff would like to make full use of this original script content for resource discovery. While the underlying textual data are in place, effective indexing, retrieval and display functionality for the non-Roman script content is not available within our bibliographic discovery applications, Orbis and Yufind. Opportunities now exist in the Unicode, Lucene/Solr computing environment to bridge the functionality gap and achieve internationalization of the Yale Library catalog. While most parts of this study focus on the Yale environment, in the absence of other such studies it is hoped that the findings will be of interest to a much larger community.Arcadia Foundatio
    corecore