60 research outputs found

    Innovative technologies for under-resourced language documentation: The BULB Project

    No full text
    International audienceThe project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we will develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. The data collection has begun for the three languages. For this we use standard mobile devices and a dedicated software—LIG-AIKUMA, which proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). LIG-AIKUMA 's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping

    Innovative technologies for under-resourced language documentation: The BULB Project

    Get PDF
    International audienceThe project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we will develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. The data collection has begun for the three languages. For this we use standard mobile devices and a dedicated software—LIG-AIKUMA, which proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). LIG-AIKUMA 's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping

    Clause-final negative markers in southeastern Bamana dialects: a contact-induced evolution

    No full text
    International audienceA compact group of southeastern Bamana dialects, viz. southern dialects of the Baninko area and dialects of the neighboring Gwandugu, Shendugu and Ganadugu areas, use an innovative clause-final negative marker, in addition to a negative marker earlier in the clause (immediately following the subject), which they share with the other Bamana dialects. Although both the form and the negative polarity semantics of this clause-final marker can be offered a language-internal diachronic account, I argue that its innovative pattern of use and ongoing grammaticalization in these dialects are best analyzed as an instance of contact-induced evolution modeled on the neighboring Senufo languages. The initial transfer of the pattern occurred when Senufo speakers shifted to Bamana

    Mande tonal reconstruction: making sense of tonal asymmetries between word classes and word shapes

    No full text
    International audienc
    corecore