Search CORE

15,749 research outputs found

Two Automatic Approaches for Analyzing Connected Speech Processes in Dutch

Author: Kessens Judith M
Strik Helmer
Wester Mirjam
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/1998
Field of study

This paper describes two automatic approaches used to study connected speech processes (CSPs) in Dutch. The first approach was from a linguistic point of view - the top-down method. This method can be used for verification of hypotheses about CSPs. The second approach - the bottom-up method -uses a constrained phone recognizer to generate phone transcriptions. An alignment was carried out between the two transcriptions and a reference transcription. A comparison between the two methods showed that 68% agreement was achieved on the CSPs. Although phone accuracy is only 63%, the bottom-up approach is useful for studying CSPs. From the data generated using the bottom-up method, indications of which CSPs are present in the material can be found. These indications can be used to generate hypotheses which can then be tested using the top-down method

Edinburgh Research Archive

Edinburgh Research Explorer

Vocabulary size influences spontaneous speech in native language users: Validating the use of automatic speech recognition in individual differences research

Author: Hintz F.
Jongman S.
Khoe Y.
Publication venue: 'SAGE Publications'
Publication date: 30/03/2020
Field of study

Previous research has shown that vocabulary size affects performance on laboratory word production tasks. Individuals who know many words show faster lexical access and retrieve more words belonging to pre-specified categories than individuals who know fewer words. The present study examined the relationship between receptive vocabulary size and speaking skills as assessed in a natural sentence production task. We asked whether measures derived from spontaneous responses to every-day questions correlate with the size of participants’ vocabulary. Moreover, we assessed the suitability of automatic speech recognition for the analysis of participants’ responses in complex language production data. We found that vocabulary size predicted indices of spontaneous speech: Individuals with a larger vocabulary produced more words and had a higher speech-silence ratio compared to individuals with a smaller vocabulary. Importantly, these relationships were reliably identified using manual and automated transcription methods. Taken together, our results suggest that spontaneous speech elicitation is a useful method to investigate natural language production and that automatic speech recognition can alleviate the burden of labor-intensive speech transcription

MPG.PuRe

Computational Sociolinguistics: A Survey

Author: de Jong Franciska
Doğruöz A. Seza
Nguyen Dong
Rosé Carolyn P.
Publication venue
Publication date: 01/01/2016
Field of study

Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

EUR Research Repository

University of Twente Research Information

DARIAH and the Benelux

Author: Backes Marianne
Chambers Sally
Hoogerwerf Maarten
Van der West Jan
Publication venue: Department of Applied Linguistics, Translators and Interpreters, University of Antwerp
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Dutch parallel corpus : a multilingual annotated corpus

Author: Desmet Piet
Macken Lieve
Paulussen Hans
Rura Lidia
Trushkina Julia
Vandeweghe Willy
Publication venue
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography

Comparing different methods for analyzing ERP signals

Author: Boves L.
Mulder K.
Ten Bosch L.
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

MPG.PuRe

Comparing Grounded Theory and Topic Modeling: Extreme Divergence or Unlikely Convergence?

Author: Agosto
Armstrong
Babchuk
Backstrom
Baumer
Blei
Burford
Charmaz
Clarke
Collins
Corbin
Deerwester
Dourish
Durkheim
Ellison
Elsweiler
Epstein
Foucault
Freeman
Geertz
Gershon
Glaser
Glaser
Glaser
Glaser
Goffman
Goggins
Goldstone
Griffiths
Grimmer
Grimmer
Haraway
Hu
Jockers
Jockers
Leskovec
Li
Lind
Lofland
Ma
Marwick
Marx
Mead
Mohr
Muller
Newell
Newman
Orlikowski
Pang
Pinch
Portwood-Stacer
Ramsay
Ramsay
Rhody
Ritzer
Roberts
Roberts
Rost
Satchell
Shankman
Skinner
Song
Star
Suominen
Tangherlini
Tukey
Underwood
Weber
Wilbur
Wyatt
Publication venue: e-Publications@Marquette
Publication date: 01/06/2017
Field of study

Researchers in information science and related areas have developed various methods for analyzing textual data, such as survey responses. This article describes the application of analysis methods from two distinct fields, one method from interpretive social science and one method from statistical machine learning, to the same survey data. The results show that the two analyses produce some similar and some complementary insights about the phenomenon of interest, in this case, nonuse of social media. We compare both the processes of conducting these analyses and the results they produce to derive insights about each method\u27s unique advantages and drawbacks, as well as the broader roles that these methods play in the respective fields where they are often used. These insights allow us to make more informed decisions about the tradeoffs in choosing different methods for analyzing textual data. Furthermore, this comparison suggests ways that such methods might be combined in novel and compelling ways

epublications@Marquette

Crossref

Error analysis in automatic speech recognition and machine translation

Author: Loomans Nicolaas Dirk Petrus
Publication venue
Publication date: 13/09/2021
Field of study

Automatic speech recognition and machine translation are well-known terms in the translation world nowadays. Systems that carry out these processes are taking over the work of humans more and more. Reasons for this are the speed at which the tasks are performed and their costs. However, the quality of these systems is debatable. They are not yet capable of delivering the same performance as human transcribers or translators. The lack of creativity, the ability to interpret texts and the sense of language is often cited as the reason why the performance of machines is not yet at the level of human translation or transcribing work. Despite this, there are companies that use these machines in their production pipelines. Unbabel, an online translation platform powered by artificial intelligence, is one of these companies. Through a combination of human translators and machines, Unbabel tries to provide its customers with a translation of good quality. This internship report was written with the aim of gaining an overview of the performance of these systems and the errors they produce. Based on this work, we try to get a picture of possible error patterns produced by both systems. The present work consists of an extensive analysis of errors produced by automatic speech recognition and machine translation systems after automatically transcribing and translating 10 English videos into Dutch. Different videos were deliberately chosen to see if there were significant differences in the error patterns between videos. The generated data and results from this work, aims at providing possible ways to improve the quality of the services already mentioned.O reconhecimento automático de fala e a tradução automática são termos conhecidos no mundo da tradução, hoje em dia. Os sistemas que realizam esses processos estão a assumir cada vez mais o trabalho dos humanos. As razões para isso são a velocidade com que as tarefas são realizadas e os seus custos. No entanto, a qualidade desses sistemas é discutível. As máquinas ainda não são capazes de ter o mesmo desempenho dos transcritores ou tradutores humanos. A falta de criatividade, de capacidade de interpretar textos e de sensibilidade linguística são motivos frequentemente usados para justificar o facto de as máquinas ainda não estarem suficientemente desenvolvidas para terem um desempenho comparável com o trabalho de tradução ou transcrição humano. Mesmo assim, existem empresas que fazem uso dessas máquinas. A Unbabel, uma plataforma de tradução online baseada em inteligência artificial, é uma dessas empresas. Através de uma combinação de tradutores humanos e de máquinas, a Unbabel procura oferecer aos seus clientes traduções de boa qualidade. O presente relatório de estágio foi feito com o intuito de obter uma visão geral do desempenho desses sistemas e das falhas que cometem, propondo delinear uma imagem dos possíveis padrões de erro existentes nos mesmos. Para tal, fez-se uma análise extensa das falhas que os sistemas de reconhecimento automático de fala e de tradução automática cometeram, após a transcrição e a tradução automática de 10 vídeos. Foram deliberadamente escolhidos registos videográficos diversos, de modo a verificar possíveis diferenças nos padrões de erro. Através dos dados gerados e dos resultados obtidos, propõe-se encontrar uma forma de melhorar a qualidade dos serviços já mencionados

Universidade de Lisboa: Repositório.UL