21 research outputs found

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Archival Phonetics & Prosodic Typology in Sixteen Australian Languages

    Get PDF
    In naturalistic speech, the phonetic instantiation of phonological categories is often highly variable. Speakers have been observed to converge on patterns of phonetic variation that are consistent within languages but variable cross-linguistically for the same phonological phenomenon. Speakers are evidently sensitive to these sorts of patterns and learn the phonetic variation in a consistent way. Furthermore, the systematicity of this variation suggests that these patterns should change over time systematically as well. Most Australian languages assign lexical stress consistently on the first syllable of the word, raising the question of how the phonetics of stress varies across languages with this phonologically stable pattern. This dissertation presents an investigation into structured variation of the acoustic correlates of stress and prosody in sixteen Indigenous languages of Australia that all have consistent initial stress placement, with a focus on the source(s) of variation in these factors cross-linguistically. Acoustic correlates of stress, despite the phonological uniformity present among these languages, show significant cross-linguistic variation, both in the presence or absence of a particular cue to stress, as well as the size of these effects. The phonological uniformity of stress assignment allows for a more controlled comparison of the acoustic correlates of stress across these languages, since the placement of stress marking remains constant. Acoustic correlates investigated are vowel duration, pre-tonic and post-tonic consonant duration, intensity, f0 (maximum and range), and vowel peripherality. These cues are identified using a series of mixed effects linear regression models. To identify the source(s) of variation in acoustic correlates to stress, the population genetics tool Analysis of Molecular Variance (AMOVA) is used. This is a statistical tool created for analysis of genetic variance that has been applied to cultural evolution topics such as music and folktales. This model finds significant variation across languages, as well as substantial intra-speaker variation, similarly to the findings for both biological and cultural evolution, but no significant intra-language variation across speakers. These results are also supported by the investigation of inter- and intra-language variation using regresssion modeling. Another population genetics measure, fixation index, is used to create a network model of language relationships based on the phonetic correlates of lexical stress. This network shows clear relationships between the Pama Nyungan languages in this sample, as well as some Gunwinyguan languages, supporting the claim that the phonetic cues to stress are stable within language families and change according to the principles of diachronic language change. Smaller groupings in this network also indicate some contact-induced change or areal effects in these phonetic markers. Phrasal prosody is also investigated in this dissertation, using a toolkit for automated phrasal contour clustering. For each language, f0 is measured at regular intervals across the word, which is used as input to a complete-linkage clustering algorithm to identify major categories of phrasal contours. Results of this sort of automatic clustering provide testable hypotheses about phrasal types in each language, while avoiding some common pitfalls of impressionistic analyses of prosodic phrases. As with the investigation into lexical stress, this sort of automated typological work serves as a crucial complement to more detailed language-specific studies for the creation of well-rounded and well-supported theories. The data used in this dissertation are narrative speech recordings sourced from language archives, collected in varying field settings. In processing these data I have created a large corpus of these recordings force aligned at the segment level and have worked out post-hoc methods for controlling noise and variation in field-collected audio to create a comparable set of language data. I include in the dissertation a lengthy discussion of these methods, with the aim of providing a practical toolkit for the use of archival materials to address novel phonetic questions, as well as to aid in the creation of language revitalization resources

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    VARIATIONist Linguistics meets CONTACT Linguistics

    Get PDF
    The current volume is dedicated to the inherently heterogeneous nature of language(s) as seen from the perspective of variationist linguistics and contact linguistics, which became established and internationally recognized sub-disciplines of (socio)linguistics during the latter half of the 20th century. Over the last few years, each paradigm has broadened the spectrum of the topics under investigation considerably, but there has not yet been an extensive and satisfactory exchange between the two scientific fields named. The present volume aims at giving an insight into the complex synergy between occurring linguistic contact constellation, on the one hand, and variation in the parlance, on the other hand

    Prosody and Intonation in Formosan Languages

    Full text link
    The Formosan languages are the languages of the Aboriginal peoples of Taiwan. These languages are part of the Austronesian language family, and represent all but one primary branch of this family of 1,200+ languages. The Formosan languages are endangered, some critically so. While these languages have seen attention in the literature for their syntactic and phonological systems, little work has been done on their prosodic structure or intonation. This dissertation analyzes the prosodic structure and intonational phonology of Mantauran Rukai, Budai Rukai, Tsou, Kanakanavu, Hla’alua, Sandimen Paiwan, Piuma Paiwan, Kavalan, Amis, Bunun, Tgdaya Seediq, Truku Seediq, and Pazeh, based on original fieldwork. In addition, archival materials are incorporated into analyses of Tsou, Truku Seediq, Tgdaya Seediq, and Puyuma. This study finds that the Formosan languages show rich tonal phonologies in their intonational systems, and have complex interactions between stress assignment and morphology. Some examples include the following: Mantauran Rukai, previously described as an initial-stress language, actually has a complex stress assignment system with an alternation between first- and third-syllable stress, which as a system is unique in descriptions of stress assignment in the world’s languages. Hla’alua (Saaroa), previously described as having free variation between antepenultimate and penultimate stress, actually has an accent system in which some lexical items are consistently produced without an accented syllable, while others are. Hla’alua also has a rich tonal phonology assigned at two higher levels of the prosodic hierarchy. Kavalan has a unique rule that causes spreading tones to shift to the opposite domain edge when a certain number of tonal elements are aligned to the same boundary. Elements of the intonational phonology in Amis and Kavalan include glottal stops in addition to tonal elements. Bunun has distinct pitch accent melodies for words vs. clitics. In addition to the unique features found in individual Formosan languages, this dissertation’s comparative study finds at least two geographic areas within Taiwan in which features of prosody and intonation cluster. One is southwestern Taiwan, including Tsou, Kanakanavu, Hla’alua, and Rukai, which share features including a lack of glide-vowel contrasts and variability of initial H vs. L elements in certain prosodic domains. The other is eastern Taiwan, including Amis, Kavalan, and Puyuma, which share features including suppression of non-IP-final pitch accents, alternations between ultimate and pre-ultimate F0 peaks across intonational contours, and interactions between glottal stop epenthesis and intonational phonology

    On the hunt for lateral phonological cross-linguistic influence in third or additional language acquisition

    Full text link
    Bis heute gilt die Existenz lateralen phonologischen Transfers (LPT), der den Einfluss eines fremdsprachlichen phonologischen Systems auf ein weiteres solches System in einem multilingualen Lerner bezeichnet, als nicht eindeutig erwiesen. In einer empirischen Studie wird LPT anhand segmentaler (Vokalreduktion und Konsonanten-Cluster-Produktion in Coda-Position) und suprasegmentaler (Sprachrhythmus) Merkmale in den drei Sprachen der 18 Lerner (L1 Mandarin, L2 Englisch und L3/Ln Deutsch) untersucht. Die Ergebnisse zeigen, dass LPT auf segmentaler und suprasegmentaler Ebene existiert, jedoch auch, dass das Phänomen äußerst komplex ist und nur unter Berücksichtigung individueller Lernerprofile und Faktoren, die LPT wohl verschieden stark bedingen, untersucht werden sollte. Neben eindeutigem LPT zeigen die zielsprachlichen Produktionen der Studienteilnehmer diesen auch in der eher unterschwelligen Form des "combined cross-linguistic influence" gemeinsam von L1 und L2 ausgehend, sowie als Transfer von L2-Hybridformen in die Zielsprache, die bereits während des L2-Erwerbs von der L1 beeinflusst wurden. Zur Heterogenität der Ergebnisse trägt des Weiteren der gemessene L1-Einfluss bei, aber auch einzelne idiosynkratische und korrekte zielsprachliche Produktionen.To date, the existence of lateral phonological transfer (LPT), i.e. the influence of a non-native phonological system onto another such system in a multilingual learner, has not been proven. An empirical study examines LPT based on segmental (vowel reduction and coda consonant cluster production) and suprasegmental (speech rhythm) features in the three languages of 18 learners (L1 Mandarin, L2 English and L3/Ln German). Results show that LPT exists both on the segmental and suprasegmental level; however, it is a rather complex phenomenon that must be investigated taking account of individual learner profiles and factors, which probably promote LPT to varying degrees. In addition to unambiguous LPT, the participants’ target-language productions also exhibit LPT in the form of underlying combined cross-linguistic influence from the L1 and L2, as well as transfer of L2 hybrid forms into the target language that were already influenced by the L1 during L2 acquisition. Besides, the L1 influence measured as well as idiosyncratic and target-like productions add to the heterogeneity of results

    Towards Multilingual Coreference Resolution

    Get PDF
    The current work investigates the problems that occur when coreference resolution is considered as a multilingual task. We assess the issues that arise when a framework using the mention-pair coreference resolution model and memory-based learning for the resolution process are used. Along the way, we revise three essential subtasks of coreference resolution: mention detection, mention head detection and feature selection. For each of these aspects we propose various multilingual solutions including both heuristic, rule-based and machine learning methods. We carry out a detailed analysis that includes eight different languages (Arabic, Catalan, Chinese, Dutch, English, German, Italian and Spanish) for which datasets were provided by the only two multilingual shared tasks on coreference resolution held so far: SemEval-2 and CoNLL-2012. Our investigation shows that, although complex, the coreference resolution task can be targeted in a multilingual and even language independent way. We proposed machine learning methods for each of the subtasks that are affected by the transition, evaluated and compared them to the performance of rule-based and heuristic approaches. Our results confirmed that machine learning provides the needed flexibility for the multilingual task and that the minimal requirement for a language independent system is a part-of-speech annotation layer provided for each of the approached languages. We also showed that the performance of the system can be improved by introducing other layers of linguistic annotations, such as syntactic parses (in the form of either constituency or dependency parses), named entity information, predicate argument structure, etc. Additionally, we discuss the problems occurring in the proposed approaches and suggest possibilities for their improvement

    Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations

    Get PDF
    This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation präsentiert eine multimodale Architektur zum Gebrauch in mobilen Umständen wie z. B. Einkaufen und Navigation. Außerdem wird ein großes Gebiet von möglichen modalen Eingabekombinationen zu diesen Umständen analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte Vorführungsprogramme zum 'stand-alone'; Gebrauch auf mobilen Geräten entworfen. Von spezieller Wichtigkeit war der Entwurf und die Ausführung eines Modalitäts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die Ausführung erlaubt die Veränderung von Zuverlässigkeitswerten innerhalb einzelner Modalitäten und außerdem ermöglicht eine Methode um die semantisch überlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische Multimodalität sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schließt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben über die bevorzugte Art verschiedener Modalitätskombinationen, sowie auch über die Akzeptanz von anthropomorphisierten Objekten

    Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations

    Get PDF
    This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation präsentiert eine multimodale Architektur zum Gebrauch in mobilen Umständen wie z. B. Einkaufen und Navigation. Außerdem wird ein großes Gebiet von möglichen modalen Eingabekombinationen zu diesen Umständen analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte Vorführungsprogramme zum \u27stand-alone\u27; Gebrauch auf mobilen Geräten entworfen. Von spezieller Wichtigkeit war der Entwurf und die Ausführung eines Modalitäts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die Ausführung erlaubt die Veränderung von Zuverlässigkeitswerten innerhalb einzelner Modalitäten und außerdem ermöglicht eine Methode um die semantisch überlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische Multimodalität sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schließt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben über die bevorzugte Art verschiedener Modalitätskombinationen, sowie auch über die Akzeptanz von anthropomorphisierten Objekten
    corecore