23 research outputs found
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
Natural Language Processing: Emerging Neural Approaches and Applications
This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains
âRussians are very sweet and niceâ:a corpus-assisted multimodal discourse analysis of the representation of people in online travel reviews about Moscow
The paper explores how guests and hosts are represented in online travel reviews about Moscow. Tourism provides an opportunity to get acquainted with the sociocultural background of other nations and potentially to improve international relations. Moscow, the capital of Russia, is sometimes viewed as an unfriendly or unsafe destination and the Russian Government aims to increase the popularity of the city. However, there are concerns that modern tourism discourse contributes to the maintenance of asymmetrical guest-host power relations. Guests are often accused of consumerism while hosts are frequently backgrounded or represented as servants or cultural markers. Such representation can lead to client-servant attitude and even cause discrimination against hosts. While online travel reviews are considered an important genre of tourism discourse, most studies analyse the representation of people in promotional or media discourse. Considering that multimodality is an integral feature of tourism discourse and that the analysis of discourse patterns allows exploring the meanings widely shared by the society, the study utilizes a corpus-assisted multimodal approach by analysing the representation of people in headlines, texts, images and image captions of a corpus of online travel reviews. The analysis corroborates previous conclusions that guests tend to be represented as consumers enjoying themselves while hosts are perceived as friendly servants. However, the study provides evidence that tourists can background not only hosts but also themselves or other tourists. Moreover, the results reveal that in contrast to promotional and media discourse, guests can also portray themselves as active, solving problems while sometimes representing guests as rude or unwelcoming. The results also show that the representation of people can vary across the modes of the same document. The study concludes that user-generated tourism discourse reveals a complex picture and can express resistance to the dominant institutional imagery
Frame of reference in Iwaidja: towards a culturally responsive early years mathematics program
Most Indigenous Australian language speaking students in remote Northern Territory locations are taught in English by non-Indigenous teachers. Their first languages are inadequately accounted for in mathematics curricula and assessments. Hypothesizing that better understanding the conceptual and linguistic framework of their students would enable teachers to teach a more culturally responsive mathematics program, this thesis considers mathematical implications of the way Australian languages encode spatial concepts. The study focussed on understanding linguistic and cognitive elements of the students’ culture as necessary precursor to responding. It used a socio-constructivist perspective of education and the theory of linguistic relativity. Differences in preferred uses and acquisition of spatial frames of reference between Indo-European and Australian languages show a discord between the sequencing of location in Early Years mathematics curricula and the understandings of Indigenous students. Phase I was a linguistic investigation of spatial frames of reference in Iwaidja, an endangered Australian language spoken on Croker Island, using tools from the Max Planck Institute for Psycholinguistics. Paired speech tasks were conducted with senior adults, adults and their children or grandchildren, and with children. The findings confirmed cross-linguistic variation in the everyday language of spatial location. The study found Iwaidja uses all three frames of reference: absolute, relative and intrinsic. Adult-to-peer speakers used a range of absolute terminologies including a sunset-sunrise axis, wind directions and an ocean-land axis. Iwaidja has a relative ‘left’ and ‘right’ and a strongly intrinsic ‘front’ and ‘back’ that can contradict the relative frame of reference in both lateral and transverse axes. It has a focus on verbal processes rather than nominal objects, raising a questioning of the perceived necessity of nominalisation of mathematical abstraction for speakers of verb-focussed languages. Adult-to-child use showed less use of absolute frame of reference and greater use of relative. Australian languages such as Iwaidja and Kunwinjku appeared to have influenced the intrinsic frame of reference in the dialect of English spoken by the children. Phase II was an ethnographic case study of Early Years mathematics teaching including teacher perceptions at Mamaruni School, Croker Island. Interviews and observations showed language difference between themselves and their students was a major issue in mathematics teaching for the teachers. With little or no training in English as a Second Language (ESL) methodologies, most of them felt challenged to teach mathematics in the context. The school’s focus on teaching literacy and Standard Australian English sometimes appeared to be at the expense of mathematics. System pressures on teachers to teach Indigenous language speaking students at an “age-appropriate” curriculum level can lead teachers to implement ineffective mathematics programs. With time and training, the teachers became more responsive to the linguistic needs of their students
Phonetics and phonology of the three-way laryngeal contrast in Madurese
Madurese, a Western Malayo-Polynesian language spoken on the Indonesian island
of Madura, exhibits a three-way laryngeal contrast distinguishing between voiced,
voiceless unaspirated and voiceless aspirated stops and an unusual consonant-vowel
(CV) co-occurrence restriction. The CV co-occurrence restriction is of phonological
interest given the patterning of voiceless aspirated stops with voiced stops rather than
with voiceless unaspirated stops, raising the question of what phonological feature
they may share. Two features have been linked with the CV co-occurrence
restriction: Advanced Tongue Root [ATR] and Lowered Larynx [LL]. However, as
no evidence of voicing during closure for aspirated stops is observed and no other
acoustic measures except voice onset time (VOT), fundamental frequency (F0),
frequencies of the first (F1) and the second (F2) formants and closure duration
relating to the proposed features have been conducted, it remains an open question
which acoustic properties are shared by voiced and aspirated stops.
Three main questions are addressed in the thesis. The first question is what acoustic
properties voiced and voiceless aspirated stops share to the exclusion of voiceless
unaspirated stops. The second question is whether [ATR] or [LL] accounts for the
patterning together of voiceless aspirated stops with voiced stops. The third question
is what the implications of the results are for a transparent phonetics-phonology
mapping that expects phonological features to have phonetic correlates associated
with them. In order to answer the questions, we looked into VOT, closure duration,
F0, F1, F2 and a number of spectral measures, i.e. H1*-A1*, H1*-A2*, H1*-A3*,
H1*-H2*, H2*-H4* and CPP. We recorded fifteen speakers of Madurese (8 females,
7 males) reading 188 disyllabic Madurese words embedded in a sentence frame.
The results show that the three-way voicing categories in Madurese have different
VOT values. The difference in VOT is robust between voiced stops on the one hand
and voiceless unaspirated and voiceless aspirated stops on the other. Albeit
statistically significant, the difference in VOT values between voiceless unaspirated
and voiceless aspirated stops is relatively small. With regard to closure duration, we
found that there is a difference between voiced stops on the one hand and voiceless
unaspirated and aspirated stops on the other. We also found that female speakers
distinguish F0 for the three categories while male speakers distinguish between F0
for voiced stops on the one hand and voiceless unaspirated and voiceless aspirated
stops on the other. The results for spectral measures show that there are no
significant differences in H1*-A1*, H1*-A3*, H1*-H2*, H2*-H4* and CPP between
vowels adjacent to voiced and voiceless aspirated stops. In contrast, there are
significant differences in these measures between vowels adjacent to voiced and
voiceless unaspirated stops and between vowels adjacent to voiceless aspirated and
voiceless unaspirated stops.
Regarding the question whether voiced and voiceless aspirated stops share certain
acoustic properties, our findings show that they do. The acoustic properties they
share are H1*-A1* for both genders, H1*-H2* for females, H1*-A3* and H2*-H4*
for males, and CPP for females at vowel onset and for males at vowel midpoint.
However, they do not share such acoustic properties as VOT, closure duration and
F0. Voiceless unaspirated and voiceless aspirated stops can be distinguished by
VOT, F0 and spectral measures, i.e. H1*-A1*, H1*-A3*, H1*-H2*, H2*-H4* and
CPP. However, these two voiceless stop categories have similar closure durations.
As regards the question if [+ATR] or [+LL] might be responsible for the patterning
together of voiceless aspirated stops with voiced stops, our findings suggest that
either feature appears to be plausible. Acoustic evidence that lends support to the
feature [+ATR] includes lower F1 and greater spectral tilt measures, i.e. H1*-A1*,
H1*-A3*, H1*-H2* and H2*-H4*, and lower CPP values. Acoustic evidence that
supports the feature [+LL] includes lower F1 and greater spectral tilt measures, i.e.
H1*-A1*, H1*-A3*, H1*-H2* and H2*-H4*, and lower CPP values. However, the
fact that voiceless aspirated stops are voiceless during closure raises a problem for
the feature [+ATR] and the fact that F0 for voiceless aspirated stops is higher than
for voiced stops also presents a problem for the feature [+LL].
The fact that not all acoustic measures fit in well with either feature is problematic to
the idea that the relationship between phonetics and phonology is transparent in the
sense that phonological features can be directly transformed into their phonetic
correlates. Following the view that not all phonological features may not be expected
to be phonetically grounded, for example, when they are related to historical sound
change, we hold the idea of a phonetics-phonology mapping which allows for other
non-phonetic factors to account for a phonological phenomenon. We also provide
historical and loanword evidence which could support that voiceless aspirated stops
in Madurese may have derived from earlier voiced stops, which probably retain their
historical laryngeal contrast through phonologisation
Rapid Generation of Pronunciation Dictionaries for new Domains and Languages
This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists
Iterated learning framework for unsupervised part-of-speech induction
Computational approaches to linguistic analysis have been used for more than half a century. The main tools come from the field of Natural Language Processing (NLP) and are based on rule-based or corpora-based (supervised) methods. Despite the undeniable success of supervised learning methods in NLP, they have two main drawbacks: on the practical side, it is expensive to produce the manual annotation (or the rules) required and it is not easy to find annotators for less common languages. A theoretical disadvantage is that the computational analysis produced is tied to a specific theory or annotation scheme. Unsupervised methods offer the possibility to expand our analyses into more resourcepoor languages, and to move beyond the conventional linguistic theories. They are a way of observing patterns and regularities emerging directly from the data and can provide new linguistic insights. In this thesis I explore unsupervised methods for inducing parts of speech across languages. I discuss the challenges in evaluation of unsupervised learning and at the same time, by looking at the historical evolution of part-of-speech systems, I make the case that the compartmentalised, traditional pipeline approach of NLP is not ideal for the task. I present a generative Bayesian system that makes it easy to incorporate multiple diverse features, spanning different levels of linguistic structure, like morphology, lexical distribution, syntactic dependencies and word alignment information that allow for the examination of cross-linguistic patterns. I test the system using features provided by unsupervised systems in a pipeline mode (where the output of one system is the input to another) and show that the performance of the baseline (distributional) model increases significantly, reaching and in some cases surpassing the performance of state-of-the-art part-of-speech induction systems. I then turn to the unsupervised systems that provided these sources of information (morphology, dependencies, word alignment) and examine the way that part-of-speech information influences their inference. Having established a bi-directional relationship between each system and my part-of-speech inducer, I describe an iterated learning method, where each component system is trained using the output of the other system in each iteration. The iterated learning method improves the performance of both component systems in each task. Finally, using this iterated learning framework, and by using parts of speech as the central component, I produce chains of linguistic structure induction that combine all the component systems to offer a more holistic view of NLP. To show the potential of this multi-level system, I demonstrate its use âin the wildâ. I describe the creation of a vastly multilingual parallel corpus based on 100 translations of the Bible in a diverse set of languages. Using the multi-level induction system, I induce cross-lingual clusters, and provide some qualitative results of my approach. I show that it is possible to discover similarities between languages that correspond to âhiddenâ morphological, syntactic or semantic elements