1,928 research outputs found

    FROM LANGUAGE TO LITERACY: STRUCTURAL FEATURES OF ACQUIRED LANGUAGES FACILITATING ENGLISH MORPHOLOGICAL AWARENESS

    Get PDF
    Morphological awareness is a crucial metalinguistic skill, specifically for English Language Learners (ELLs). Since languages differ widely in degree of orthographic opacity, degree of morphological fusion, and degree of morphological synthesis, this thesis sought to evaluate the impact of the structural features of other languages upon ELLs’ levels of English morphological awareness. Additionally, the study investigated the relationship between morphological awareness and perceived levels of literacy and oracy proficiency. Multilingual individuals responded to an online survey containing a morphological awareness task and a language history questionnaire. Each language represented in the sample was coded according to its structural features. Subsequently, the relationship between the features and morphological awareness was analyzed. Morphological awareness was impacted by a confluence of all three structural features. Knowledge of languages with higher degrees of morphological synthesis or higher degrees of orthographic opacity was found to predict higher levels of morphological awareness. Additionally, perceived English literacy proficiency explained a larger degree of the variance in English morphological awareness than perceived English oracy proficiency, though both were statistically significant. The findings indicate the acquisition of English may be impacted by familiarity with other languages and by perceptions of English proficienc

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Speech recognition systems and russian pronunciation variation in the context of VoiceInteraction

    Get PDF
    The present thesis aims to describe the work performed during the internship for the master’s degree in Linguistics at VoiceInteraction, an international Artificial Intelligence (AI) company, specializing in developing speech processing technologies. The goal of the internship was to study phonetic characteristics of the Russian language, attending to four main tasks: description of the phonetic-phonological inventory; validation of transcriptions of broadcast news; validation of a previously created lexicon composed by ten thousand (10 000) most frequently observed words in a text corpus crawled from Russian reference newspapers websites; and integration of filled pauses into the Automatic Speech Recognizer (ASR). Initially, a collection of audio and text broadcast news media from Russian-speaking regions, European Russian, Belarus, and the Caucasus Region, featuring different varieties of Russian was conducted. The extracted data and the company's existing data were used to train the acoustic, pronunciation, and language models. The audio data was automatically processed in a proprietary platform and then revised by human annotators. Transcriptions produced automatically and reviewed by annotators were analyzed, and the most common errors were extracted to provide feedback to the community of annotators. The validation of transcriptions, along with the annotation of all of the disfluencies (that previously were left out), resulted in the decrease of Word Error Rate (WER) in most cases. In some cases (in European Russian transcriptions), WER increased, the models were not sufficiently effective to identify the correct words, potentially problematic. Also, audio with overlapped speech, disfluencies, and acoustic events can impact the WER. Since we used the model that was only trained with European Russian to recognize other varieties of Russian language, it resulted in high WER for Belarus and the Caucasus region. The characterization of the Russian phonetic-phonological inventory and the construction of pronunciation rules for internal and external sandhi phenomena were performed for the validation of the lexicon – ten thousand of the most frequently observed words in a text corpus crawled from Russian reference newspapers websites, were revised and modified for the extraction of linguistic patterns to be used in a statistical Grapheme-to-phone (G2P) model. Two evaluations were conducted: before the modifications to the lexicon and after. Preliminary results without training the model show no significant results - 19.85% WER before the modifications, and 19.97% WER after, with a difference of 0.12%. However, we observed a slight improvement of the most frequent words. In the future, we aim to extend the analysis of the lexicon to the 400 000 entries (total lexicon size), analyze the type of errors that are produced, decrease the word error rate (WER), and analyze acoustic models, as well. In this work, we also studied filled pauses, since we believe that research on filled pauses for the Russian language can improve the recognition system of VoiceInteraction, by reducing the processing time and increasing the quality. These are marked in the transcriptions with “%”. In Russian, according to the literature (Ten, 2015; Harlamova, 2008; Bogradonova-Belgarian & Baeva, 2018), these are %a [a], %am [am], %@ [ə], %@m [əm], %e [e], %ɨ [ɨ], %m [m], and %n [n]. In the speech data, two more filled pauses were found, namely, %na [na] and %mna [mna], as far as we know, not yet referenced in the literature. Finally, the work performed during an internship contributed to a European project - Artificial Intelligence and Advanced Data Analysis for Authority Agencies (AIDA). The main goal of the present project is to build a solution capable of automating the processing of large amounts of data that Law Enforcement Agencies (LEAs) have to analyze in the investigations of Terrorism and Cybercrime, using pioneering machine learning and artificial intelligence methods. VoiceInteraction's main contribution to the project was to apply ASR and validate the transcriptions of the Russian (religious-related content). In order to do so, all the tasks performed during the thesis were very relevant and applied in the scope of the AIDA project. Transcription analysis results from the AIDA project showed a high Out-of-Vocabulary (OOV) rate and high substitution (SUBS) rate. Since the language model used in this project was adapted for broadcast content, the religious-related words were left out. Also, function words were incorrectly recognized, in most cases, due to coarticulation with the previous or the following word.A presente tese descreve o trabalho que foi realizado no âmbito de um estágio em linguística computacional na VoiceInteraction, uma empresa de tecnologias de processamento de fala. Desde o início da sua atividade, a empresa tem-se dedicado ao desenvolvimento de tecnologia própria em várias áreas do processamento computacional da fala, entre elas, síntese de fala, processamento de língua natural e reconhecimento automático de fala, representando esta última a principal área de negócio da empresa. A tecnologia de reconhecimento de automático de fala da VoiceInteraction explora a utilização de modelos híbridos em combinação com as redes neuronais (DNN - Deep Neural Networks), que, segundo Lüscher et al. (2019), apresenta um melhor desempenho, quando comparado com modelos de end-to-end apenas. O objetivo principal do estágio focou-se no estudo da fonética da língua russa, atendendo a quatro tarefas: criação do inventário fonético-fonológico; validação das transcrições de noticiários; validação do léxico previamente criado e integração de pausas preenchidas no sistema. Inicialmente, foi realizada uma recolha dos principais meios de comunicação (áudio e texto), apresentando diferentes variedades do russo, nomeadamente, da Rússia Europeia, Bielorrússia e Cáucaso Central. Na Rússia europeia o russo é a língua oficial, na Bielorrússia o russo faz parte das línguas oficiais do país, e na região do Cáucaso Central, o russo é usado como língua franca, visto que este era falado na União Soviética e continua até hoje a ser falado nas regiões pós-Soviéticas. Tratou-se de abranger a maior cobertura possível da língua russa e neste momento apenas foi possível recolher os dados das variedades mencionadas. Os dados extraídos de momento, juntamente com os dados já existentes na empresa, foram utilizados no treino dos modelos acústicos, modelos de pronúncia e modelos de língua. Para o tratamento dos dados de áudio, estes foram inseridos numa plataforma proprietária da empresa, Calligraphus, que, para além de fornecer uma interface de transcrição para os anotadores humanos poderem transcrever os conteúdos, efetua também uma sugestão de transcrição automática desses mesmos conteúdos, a fim de diminuir o esforço despendido pelos anotadores na tarefa. De seguida, as transcrições foram analisadas, de forma a garantir que o sistema de anotação criado pela VoiceInteraction foi seguido, indicando todas as disfluências de fala (fenómenos característicos da edição da fala), tais como prolongamentos, pausas preenchidas, repetições, entre outros e transcrevendo a fala o mais próximo da realidade. Posteriormente, os erros sistemáticos foram analisados e exportados, de forma a fornecer orientações e sugestões de melhoria aos anotadores humanos e, por outro lado, melhorar o desempenho do sistema de reconhecimento. Após a validação das transcrições, juntamente com a anotação de todas as disfluências (que anteriormente eram deixadas de fora), observamos uma diminuição de WER, na maioria dos casos, tal como esperado. Porém, em alguns casos, observamos um aumento do WER. Apesar das correções efetuadas aos ficheiros analisados, os modelos não foram suficientemente eficazes no reconhecimento das palavras corretas, potencialmente problemáticas. A elevada taxa de WER nos áudios com debates políticos, está relacionada com uma maior frequência de fala sobreposta e disfluências (e.g., pausas preenchidas, prolongamentos). O modelo utilizado para reconhecer todas as variedades foi treinado apenas com a variedade de russo europeu e, por isso, o WER alto também foi observado para as variedades da Bielorrússia e para a região do Cáucaso. Numa perspetiva baseada em dados coletados pela empresa, foi realizada, de igual modo, uma caracterização e descrição do inventário fonético-fonológico do russo e a construção de regras de pronúncia, para fenómenos de sandhi interno e externo (Shcherba, 1957; Litnevskaya, 2006; Lekant, 2007; Popov, 2014). A empresa já empregava, através de um G2P estatístico específico para russo, um inventário fonético para o russo, correspondente à literatura referida anteriormente, mas o mesmo ainda não havia sido validado. Foi possível realizar uma verificação e correção, com base na caracterização dos fones do léxico do russo e nos dados ecológicos obtidos de falantes russos em situações comunicativas diversas. A validação do inventário fonético-fonológico permitiu ainda a consequente validação do léxico de russo. O léxico foi construído com base num conjunto de características (e.g., grafema em posição átona tem como pronúncia correspondente o fone [I] e em posição tónica - [i]; o grafema em posição final de palavra é pronunciado como [- vozeado] - [f]; entre outras características) e foi organizado com base no critério da frequência de uso. No total, foram verificadas dez mil (10 000) palavras mais frequentes do russo, tendo por base as estatísticas resultantes da análise dos conteúdos existentes num repositório de artigos de notícias recolhidos previamente de jornais de referência em língua russa. Foi realizada uma avaliação do sistema de reconhecimento antes e depois da modificação das dez mil palavras mais frequentemente ocorridas no léxico - 19,85% WER antes das modificações, e 19,97% WER depois, com uma diferença de 0,12%. Os resultados preliminares, sem o treino do modelo, não demonstram resultados significativos, porém, observamos uma ligeira melhoria no reconhecimento das palavras mais frequentes, tais como palavras funcionais, acrónimos, verbos, nomes, entre outros. Através destes resultados e com base nas regras criadas a partir da correção das dez mil palavras, pretendemos, no futuro, alargar as mesmas a todo o léxico, constituído por quatrocentas mil (400 000) entradas. Após a validação das transcrições e do léxico, com base na literatura, foi também possível realizar uma análise das pausas preenchidas do russo para a integração no sistema de reconhecimento. O interesse de se incluir também as pausas no reconhecedor automático deveu-se sobretudo a estes mecanismos serem difíceis de identificar automaticamente e poderem ser substituídos ou por afetarem as sequências adjacentes. De acordo com o sistema de anotação da empresa, as pausas preenchidas são marcadas na transcrição com o símbolo de percentagem - %. As pausas preenchidas do russo encontradas na literatura foram %a [a], %am [am] (Rose, 1998; Ten, 2015), %@ [ə], %@m [əm] (Bogdanova-Beglarian & Baeva, 2018) %e [e], %ɨ [ɨ], %m [m] e %n [n] (Harlamova, 2008). Nos dados de áudio disponíveis na referida plataforma, para além das pausas preenchidas mencionadas, foram encontradas mais duas, nomeadamente, %na [na] e %mna [mna], até quanto nos é dado saber, ainda não descritas na literatura. De momento, todas as pausas preenchidas referidas já fazem parte dos modelos de reconhecimento automático de fala para a língua russa. O trabalho desenvolvido durante o estágio, ou seja, a validação dos dados existentes na empresa, foi aplicado ao projeto europeu AIDA - The Artificial Intelligence and Advanced Data Analysis for Authority Agencies. O objetivo principal do presente projeto é de criar uma solução capaz de detetar possíveis crimes informáticos e de terrorismo, utilizando métodos de aprendizagem automática. A principal contribuição da VoiceInteraction para o projeto foi a aplicação do ASR e validação das transcrições do russo (conteúdo relacionado com a religião). Para tal, todas as tarefas realizadas durante a tese foram muito relevantes e aplicadas no âmbito do projeto AIDA. Os resultados da validação das transcrições do projeto, mostraram uma elevada taxa de palavras Fora de Vocabulário (OOV) e uma elevada taxa de Substituição (SUBS). Uma vez que o modelo de língua utilizado neste projeto foi adaptado ao conteúdo noticioso, as palavras relacionadas com a religião não se encontravam neste. Além disso, as palavras funcionais foram incorretamente reconhecidas, na maioria dos casos, devido à coarticulação com a palavra anterior ou a seguinte

    Max Planck Institute for Psycholinguistics: Annual report 1996

    No full text

    A survey on perceived speaker traits: personality, likability, pathology, and the first challenge

    Get PDF
    The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Gesture and Speech in Interaction - 4th edition (GESPIN 4)

    Get PDF
    International audienceThe fourth edition of Gesture and Speech in Interaction (GESPIN) was held in Nantes, France. With more than 40 papers, these proceedings show just what a flourishing field of enquiry gesture studies continues to be. The keynote speeches of the conference addressed three different aspects of multimodal interaction:gesture and grammar, gesture acquisition, and gesture and social interaction. In a talk entitled Qualitiesof event construal in speech and gesture: Aspect and tense, Alan Cienki presented an ongoing researchproject on narratives in French, German and Russian, a project that focuses especially on the verbal andgestural expression of grammatical tense and aspect in narratives in the three languages. Jean-MarcColletta's talk, entitled Gesture and Language Development: towards a unified theoretical framework,described the joint acquisition and development of speech and early conventional and representationalgestures. In Grammar, deixis, and multimodality between code-manifestation and code-integration or whyKendon's Continuum should be transformed into a gestural circle, Ellen Fricke proposed a revisitedgrammar of noun phrases that integrates gestures as part of the semiotic and typological codes of individuallanguages. From a pragmatic and cognitive perspective, Judith Holler explored the use ofgaze and hand gestures as means of organizing turns at talk as well as establishing common ground in apresentation entitled On the pragmatics of multi-modal face-to-face communication: Gesture, speech andgaze in the coordination of mental states and social interaction.Among the talks and posters presented at the conference, the vast majority of topics related, quitenaturally, to gesture and speech in interaction - understood both in terms of mapping of units in differentsemiotic modes and of the use of gesture and speech in social interaction. Several presentations explored the effects of impairments(such as diseases or the natural ageing process) on gesture and speech. The communicative relevance ofgesture and speech and audience-design in natural interactions, as well as in more controlled settings liketelevision debates and reports, was another topic addressed during the conference. Some participantsalso presented research on first and second language learning, while others discussed the relationshipbetween gesture and intonation. While most participants presented research on gesture and speech froman observer's perspective, be it in semiotics or pragmatics, some nevertheless focused on another importantaspect: the cognitive processes involved in language production and perception. Last but not least,participants also presented talks and posters on the computational analysis of gestures, whether involvingexternal devices (e.g. mocap, kinect) or concerning the use of specially-designed computer software forthe post-treatment of gestural data. Importantly, new links were made between semiotics and mocap data

    Zeichensprache und erfolgreiche bilinguale Entwicklung bei sprachbehinderten Kindern

    Get PDF
    This paper reviews research on language development of deaf children, comparing those who have early access to natural sign language with those who do not. Early learning of sign language does not create concerns for the child\u27s development of other languages, speech, reading, or other cognitive skills. In fact, it can contribute directly to establishment of more of the high-level skills needed for successful bilingual development. The global benefit of learning a sign language as a first language is that in the resulting bilingual communicative setting, teachers and learners can take advantage of one language to assist in acquiring the other and in the transfer of general knowledge. As part of this discussion, English and ASL are compared as representatives of spoken and signed natural languages to provide explicit examples of their similarities and differences.Rad prikazuje istraživanja o jezičnom razvoju gluhe djece, uspoređujući onu koja se rano počinju sporazumijevati znakovima i onu koja to ne čine. Rano učenje znakovnog jezika ne stvara djetetu teškoće u svladavanju drugih jezika, govoru, čitanju ili drugim kognitivnim vještinama. Naprotiv, ono može izravno pridonijeti stvaranju većega broja razvijenih vještina potrebnih za uspješan dvojezični razvoj. Opća korist učenja znakovnoga jezika kao prvog jezika je ta da u proizlazećem dvojezičnom komunikacijskom okružju učitelji i učenici mogu iskoristiti jedan jezik koji će pomoći pri usvajanju drugoga te potaknuti prijenos općega znanja. U okviru ove rasprave, autorica uspoređuje engleski jezik i ASL (američki znakovni jezik) kao predstavnike govornoga i znakovnoga prirodnog jezika, kako bi dala jasne primjere njihovih sličnosti i razlika.Dieser Artikel präsentiert eine Untersuchung über die Entwicklung sprachbehinderter Kinder. Es geht konkret um einen Vergleich zwischen Kindern, die sich früh mit dem Gebrauch der Zeichensprache vertraut machen, und solchen, die sich auf andere Weise verständigen. Der frühe Erwerb der Zeichensprache bereitet dem Kind keinerlei Schwierigkeiten beim Erwerb anderer Sprachen, beim Sprechen, Lesen oder bei anderen kognitiven Fähigkeiten. Im Gegenteil: Die Beherrschung der Zeichensprache kann unmittelbar zur Entwicklung einer größeren Zahl von Fähigkeiten beitragen, die die Voraussetzung für eine erfolgreiche bilinguale Entwicklung des Kindes sind. Der allgemeine Nutzen vom Erwerb der Zeichensprache als der ersten Sprache besteht darin, dass in dem sich ergebenden zweisprachigen Kommunikationsumfeld Lehrer und Schüler die erste Sprache als Lernstütze beim Erwerb der zweiten Sprache verwenden und so außerdem die Vermittlung von allgemeinen Kenntnissen anregen können. Der Verfasser des Artikels stellt einen Vergleich zwischen dem Englischen und der Amerikanischen Zeichensprache (ASL) an, welche zum einen die gesprochene und zum anderen eine natürliche Zeichensprache darstellen, und führt klare Beispiele zum Beleg ihrer Ähnlichkeiten und Unterschiede an
    corecore