7,005 research outputs found

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces

    Get PDF
    Computer-assisted transcription promises high-quality speech transcription at reduced costs. This is achieved by limiting human effort to transcribing parts for which automatic transcription quality is insufficient. Our goal is to improve the human transcription quality via appropriate user interface design. We focus on iterative interfaces that allow humans to solve tasks based on an initially given suggestion, in this case an automatic transcription. We conduct a user study that reveals considerable quality gains for three variations of iterative interfaces over a non-iterative from-scratch transcription interface. Our iterative interfaces included post-editing, confidence-enhanced post-editing, and a novel retyping interface. All three yielded similar quality on average, but we found that the proposed retyping interface was less sensitive to the difficulty of the segment, and superior when the automatic transcription of the segment contained relatively many errors. An analysis using mixed-effects models allows us to quantify these and other factors and draw conclusions over which interface design should be chosen in which circumstance

    A Comparison of Norm-Referenced, Traditional, and Computer-Assisted Language Assessments

    Get PDF
    Current literature in the field of communication disorders suggests that traditional norm-referenced tests may yield erroneous or misleading information regarding a child\u27s level of language acquisition. Additional research suggests that the most valid and reliable technique for determining a client\u27s level of linguistic expertise is language sampling and analysis. Language sampling and analysis has traditionally been rejected as a means of evaluation, especially for the school-age child, due to the length of time necessary to complete such analyses. In recent years, language sampling and analysis techniques have been redesigned as computer software application programs. Computer software application programs may significantly reduce the time required to complete language sampling and analysis and increase the application of this validated method of language assessment. Implementation of language sampling and analysis procedures through software application would reduce the reliance on traditional norm-referenced tests thereby increasing the reliability and validity of language assessments. The purpose of this research was to compare both the time required and the time to data ratio in three assessment paradigms. These paradigms include the traditional norm-referenced assessment, the traditional by-hand language sampling and computer-assisted language analysis procedure, and the sampling and analysis procedure. Significant differences among assessment times suggested that computer-assisted langauge analysis took significantly less time than manual language sample analysis. Analysis of time/data ratio indicated that computer-assisted analysis provided the most information per unit of time. These results supported the use of computer-assisted software programs for speech and language service providers

    Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories

    Full text link
    [EN] Video lectures are widely used in education to support and complement face-to-face lectures. However, the utility of these audiovisual assets could be further improved by adding subtitles that can be exploited to incorporate added-value functionalities such as searchability, accessibility, translatability, note-taking, and discovery of content-related videos, among others. Today, automatic subtitles are prone to error, and need to be reviewed and post-edited in order to ensure that what students see on-screen are of an acceptable quality. This work investigates different user interface design strategies for this post-editing task to discover the best way to incorporate automatic transcription technologies into large educational video repositories. Our three-phase study involved lecturers from the Universitat Polite`cnica de Vale`ncia (UPV) with videos available on the poliMedia video lecture repository, which is currently over 10,000 video objects. Simply by conventional post-editing automatic transcriptions users almost reduced to half the time that would require to generate the transcription from scratch. As expected, this study revealed that the time spent by lecturers reviewing automatic transcriptions correlated directly with the accuracy of said transcriptions. However, it is also shown that the average time required to perform each individual editing operation could be precisely derived and could be applied in the definition of a user model. In addition, the second phase of this study presents a transcription review strategy based on confidence measures (CM) and compares it to the conventional post-editing strategy. Finally, a third strategy resulting from the combination of that based on CM with massive adaptation techniques for automatic speech recognition (ASR), achieved to improve the transcription review efficiency in comparison with the two aforementioned strategies. 2015 Elsevier B.V. All rights reserved.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant agreement no. 287755 (transLectures) and ICT Policy Support Programme (ICT PSP/2007-2013) as part of the Competitiveness and Innovation Framework Programme (CIP) under Grant agreement no. 621030 (EMMA), and the Spanish MINECO Active2Trans (TIN2012-31723) research project.Valor Miró, JD.; Silvestre Cerdà, JA.; Civera Saiz, J.; Turró Ribalta, C.; Juan Císcar, A. (2015). Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. Speech Communication. 74:65-75. https://doi.org/10.1016/j.specom.2015.09.006S65757

    Application of Automatic Speech Recognition Technology for Dysphonic Speech Assessment

    Get PDF
    Dysphonia is a communication disorder secondary to a problem with voice production. Speakers with dysphonia often report decreased intelligibility, particularly in a noisy communication environment. Intelligibility is the primary measure of a speaker’s communicative ability; however, it is not routinely assessed in clinical settings today. This lack of intelligibility assessment can be partly attributed to the time-consuming, labor-intensive nature of manually transcribing a speaker’s utterance. Recent advances in automatic speech recognition technology have significantly increased the ease and accuracy of speech-to-text transcription, and incorporation of this technology may dramatically increase efficiency in clinical intelligibility assessment. Therefore, this project examined the feasibility of an automatic speech-to-text transcription program for describing speech production abnormalities among speakers with dysphonia. Audio recordings of the Rainbow Passage from 30 adult female speakers with normal voice and 23 adult female speakers with dysphonic voice were transcribed using IBM Watson speech-to-text transcription service. Differences between the groups were evaluated based on three measures: 1) error rate in transcribed words, 2) confidence level of transcribed words, and 3) number of possible alternatives for transcribed words. The results indicated that the confidence level was significantly lower, and the number of possible alternatives was significantly higher in the dysphonic group. Interestingly, there was no significant between-group difference in the error rate. Clinical implications of these findings and future direction will be discussed.Ope

    Perception in L2 in a Classroom Environment with L2 Portuguese Chinese Students

    Get PDF
    The purpose of this study is to contribute to the knowledge on the impact of common European Portuguese (EP) phonetic-phonological processes in second language (L2) learners. It is well established that L2 listening is a complex process, and that the most common difficulties among L2 learners are related to speech segmentation and word recognition. Due to the occurrence of connected speech processes, sounds are altered and the word boundaries can be hard to determine. Vowel reduction within and across word boundaries is usually described as a very frequent process in EP. The reduction of vowels is even more evident in spontaneous speech, e.g. the word 'telefone' (jtifi'foni] in the citation form, 'telephone') can be produced as [t'fon]. The interplay between these processes can be particularly impactful for L2 learners, in word recognition. Furthermore, this correlation is scarcely studied in a classroom setting. The present study explores the impact of vowel reduction and connected speech processes in word recognition tasks from isolated words to continuous speech. Furthermore, it aims to understand the main difficulties that L2 learners, at the intermediate levei B 1, experience dealing with these phenomena. Lastly, it will contribute to understand not only the acquisition of vowel reduction and connected speech processes but also whether L2 learners could cope with them. Therefore, it was designed a set of perception experiments involving these phenomena in increasing degrees of difficulty: single word identification without (i) and with vowel reduction (ii); word identification with simple (iii) and complex connected speech processes (iv). The experiments were conducted in an ecological setting of an intensive Portuguese course, of the intermediate levei B 1, at the University of Lisbon. A contrai group of EP native speakers also performed the experiments. The overall scores revealed a decreasing tendency: (i) 94%; (ii) 65%; (iii) 31 %; (iv) 16%. The results reveal that word recognition is compromised due to the connected speech processes. Vowel reduction and the consequent deletion of segments also affects the recognition of isolated spoken words even in read speech. The didactic outcomes of the experiments are relevant to contribute to the design of a proposal of a set of listening exercises focused on the practice of these phonetic-phonological phenomena. The sequence is based on the use of Computer-Assisted Language Learning (CALL) technology, and includes two games and a set of perception exercises in which songs are also used as input.Este estudo tem como objetivo contribuir para o conhecimento do impacto dos processos fonético-fonológicos do Português Europeu (PE), em aprendentes de PE Língua Segunda (L2). Pela nossa experiência como falantes nativos, sabemos que ouvir na própria língua nativa (LI) é um processo natural e intuitivo, ainda que complexo. No entanto, talvez por experiência própria também se saiba que, quando se aprende uma língua estrangeira, um dos maiores desafios está relacionado com a compreensão oral. Os mecanismos auditivos que intervêm na compreensão oral da língua nativa são os mesmos que intervêm na compreensão oral da L2. Ouvir e compreender uma língua estrangeira é um processo muito complexo. Quando ouvimos, é necessário segmentar o discurso, e reconhecer as palavras no contínuo sonoro. No entanto, devido à ocorrência de processos fonéticos, próprios do discurso oral, os sons são alterados e coarticulados. Ao processar a L 1, os falantes nativos não têm quaisquer dificuldades. Pelo contrário, os aprendentes L2 podem ter várias dificuldades no que diz respeito ao reconhecimento de palavras no contínuo sonoro. Dada a ocorrência dos vários fenómenos de coarticulação, as fronteiras de palavra deixam de ser percetíveis e evidentes. Em PE, a redução vocálica está bem descrita na literatura, e é referida como um processo muito produtivo. Uma das suas características é promover a redução e o apagamento de vogais átonas. Como resultado, há uma frequente ocorrência de sequências fonéticas de consoantes nas produções orais. Interessa também salientar que a redução vocálica não só ocorre em fala espontânea como também é frequente em fala semi-espontânea e não espontânea. O PE também é rico em processos de coarticulação, e muitos deles estão em direta relação com a redução e o apagamento das vogais átonas. Os processos de discurso oral contínuo muito bem estudados e descritos. Entre os processos de coarticulação estão os encontros vocálicos, os fenómenos de sândi, a assimilação do vozeamento e a inserção da iode Li]. Estes processos causam diversas alterações nos sons das palavras, sobretudo nos que estão presentes nas fronteiras de palavra. Além disso, promovem ainda a realização e a inserção de outros sons, que não têm uma representação gráfica. Outros são responsáveis pela restruturação da estrutura silábica, causando a articulação entre os sons das palavras vizinhas. Do que é conhecido da literatura, a interação entre a redução vocálica e os fenómenos de sândi externo, coarticulação, assimilação, etc. , e o seu impacto na compreensão oral de aprendentes L2 ainda não está estudada. Além disso, ainda não existem estudos realizados em contexto ecológico de sala de aula, que se debrucem sobre esta correlação. Pelo que se sabe, também, o ensino destes processos fonético-fonológicos não é amplamente explorado nos diversos referênciais e programas de ensino de PLE. Como tal, existe a necessidade de consciencializar os alunos para a ocorrência destes processos na oralidade, e de como têm impacto na sua compreensão oral. Isso terá também uma relação direta com a aprendizagem e o desenvolvimento da proficiência dos estudantes. Portanto, o presente estudo tem como objetivo explorar o impacto da ocorrência de processos fonético-fonológicos, tais como a redução vocálica e os processos da oralidade, em tarefas de reconhecimento e identificação de palavra. Estas tarefas foram realizadas quer em contextos de palavra isolada quer em contextos de discurso oral contínuo. Além disso, este estudo procura entender, e descrever, as maiores dificuldades dos aprendentes L2, no nível intermédio B 1, face à presença destes processos fonéticos na oralidade. Finalmente, este estudo será não só um contributo para um melhor conhecimento do processo de aquisição da redução vocálica e dos fenómenos de coarticulção, mas também para entender em que momento do processo de aprendizagem os estudantes precisam de treinar estas estruturas. Portanto, foi criada uma sequência de testes percetivos que incluíram a ocorrência destes fenómenos em diferentes níveis de complexidade: tarefas de reconhecimento de palavra isolada sem (i) e com a produção de redução vocálica (ii); tarefas de reconhecimento de palavra com ocorrência dos processos de discurso oral em contextos simples (iii) e complexos (iv). Os testes de perceção foram aplicados em contexto ecológico de sala de aula de um curso intensivo anual de Português Para Estrangeiros (PLE), do nível intermédio (Bl ), na Universidade de Lisboa. Todos os participantes eram falantes nativos do Chinês Mandarim. Os testes também foram aplicados a um grupo de controlo, composto por falantes nativos do PE. Nos resultados obtidos houve, em geral, uma tendência de decréscimo das percentagens para as respostas corretas conforme o aumento da complexidade das estruturas linguísticas: (i) 94%; (ii) 65%; (iii) 31 %; (iv) 16%. Estes resultados indicam que a ocorrência dos fenómenos do discurso oral comprometeram a tarefa de reconhecimento de palavra, nos diversos contextos. No entanto, este impacto foi bastante evidente nos contextos que testaram fala (serni-) espontânea. Além disso, a redução vocálica, e o consequente apagamento de vogais átonas, afetou ainda o reconhecimento de palavras isoladas. Para complementar estes resultados, foi também realizada a análise das transcrições ortográficas dos alunos que permitiu a descrição dos erros ortográficos mais frequentes. Por outro lado, contribuíram também para entender a interação entre a oralidade e a escrita na sua aprendizagem do PEcomo L2. As transcrições revelaram também que a presença da redução vocálica aumentou a produção de grupos consonânticos fonéticos, algo que se tornou evidente pelo facto de, frequentemente, apenas partes das palavras-alvo (ou apenas consoantes) terem sido transcritas. É importante também salientar que, aquando da realização das tarefas de identificação de palavras em contexto de fala (serni-) espontânea (com a presença de fenómenos de coarticulação), vários alunos transcreveram sequências de palavras articuladas, incluindo os sons articulados e alterados por processos de assimilação, por exemplo. Do ponto de vista didático, os resultados obtidos foram relevantes para a construção de uma proposta de uma sequência didática com exercícios para a prática e para o treino da componente auditiva. Esta proposta de exercícios tem um especial ênfase no ensino dos processos fonéticofonológicos do PE. Esta sequência é baseada na tecnologia existente para a Aprendizagem de Línguas Assistida por Computador (ALAC). Portanto, foram desenhados dois jogos, uma versão do clássico 'Jogo da Forca' e o jogo das 'Palavras Bomba-Relógio'. Finalmente, a sequência também inclui uma série de exercícios de perceção oral, intitulada 'À Procura dos Limites' no qual se pretende praticar o reconhecimento das fronteiras das palavra em diferentes contextos de coarticulação. Além disso, a série inclui uma secção de exercícios lúdica, nos quais são utilizadas músicas portuguesas como input auditivo

    Computer-assisted transcription and analysis of speech

    Get PDF
    The two papers included in this volume have developed from work with the CHILDES tools and the Media Editor in the two research projects, "Second language acquisition of German by Russian learners", sponsored by the Max Planck Institute for Psycholinguistics, Nijmegen, from 1998 to 1999 (directed by Ursula Stephany, University of Cologne, and Wolfgang Klein, Max Planck Institute for Psycholinguistics, Nijmegen) and "The age factor in the acquisition of German as a second language", sponsored by the German Science Foundation (DFG), Bonn, since 2000 (directed by Ursula Stephany, University of Cologne, and Christine Dimroth, Max Planck Institute for Psycholinguistics, Nijmegen). The CHILDES Project has been developed and is being continuously improved at Carnegie Mellon University, Pittsburgh, under the supervision of Brian MacWhinney. Having used the CHILDES tools for more than ten years for transcribing and analyzing Greek child data there it was no question that I would also use them for research into the acquisition of German as a second language and analyze the big amount of spontaneous speech gathered from two Russian girls with the help of the CLAN programs. When in the spring of 1997, Steven Gillis from the University of Antwerp (in collaboration with Gert Durieux) developed a lexicon-based automatic coding system based on the CLAN program MOR and suitable for coding languages with richer morphologies than English, such as Modern Greek. Coding huge amounts of data then became much quicker and more comfortable so that I decided to adopt this system for German as well. The paper "Working with the CHILDES Tools" is based on two earlier manuscripts which have grown out of my research on Greek child language and the many CHILDES workshops taught in Germany, Greece, Portugal, and Brazil over the years. Its contents have now been adapted to the requirements of research into the acquisition of German as a second language and for use on Windows

    Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories

    Full text link
    Nowadays, the technology enhanced learning area has experienced a strong growth with many new learning approaches like blended learning, flip teaching, massive open online courses, and open educational resources to complement face-to-face lectures. Specifically, video lectures are fast becoming an everyday educational resource in higher education for all of these new learning approaches, and they are being incorporated into existing university curricula around the world. Transcriptions and translations can improve the utility of these audiovisual assets, but rarely are present due to a lack of cost-effective solutions to do so. Lecture searchability, accessibility to people with impairments, translatability for foreign students, plagiarism detection, content recommendation, note-taking, and discovery of content-related videos are examples of advantages of the presence of transcriptions. For this reason, the aim of this thesis is to test in real-life case studies ways to obtain multilingual captions for video lectures in a cost-effective way by using state-of-the-art automatic speech recognition and machine translation techniques. Also, we explore interaction protocols to review these automatic transcriptions and translations, because unfortunately automatic subtitles are not error-free. In addition, we take a step further into multilingualism by extending our findings and evaluation to several languages. Finally, the outcomes of this thesis have been applied to thousands of video lectures in European universities and institutions.Hoy en día, el área del aprendizaje mejorado por la tecnología ha experimentado un fuerte crecimiento con muchos nuevos enfoques de aprendizaje como el aprendizaje combinado, la clase inversa, los cursos masivos abiertos en línea, y nuevos recursos educativos abiertos para complementar las clases presenciales. En concreto, los videos docentes se están convirtiendo rápidamente en un recurso educativo cotidiano en la educación superior para todos estos nuevos enfoques de aprendizaje, y se están incorporando a los planes de estudios universitarios existentes en todo el mundo. Las transcripciones y las traducciones pueden mejorar la utilidad de estos recursos audiovisuales, pero rara vez están presentes debido a la falta de soluciones rentables para hacerlo. La búsqueda de y en los videos, la accesibilidad a personas con impedimentos, la traducción para estudiantes extranjeros, la detección de plagios, la recomendación de contenido, la toma de notas y el descubrimiento de videos relacionados son ejemplos de las ventajas de la presencia de transcripciones. Por esta razón, el objetivo de esta tesis es probar en casos de estudio de la vida real las formas de obtener subtítulos multilingües para videos docentes de una manera rentable, mediante el uso de técnicas avanzadas de reconocimiento automático de voz y de traducción automática. Además, exploramos diferentes modelos de interacción para revisar estas transcripciones y traducciones automáticas, pues desafortunadamente los subtítulos automáticos no están libres de errores. Además, damos un paso más en el multilingüismo extendiendo nuestros hallazgos y evaluaciones a muchos idiomas. Por último, destacar que los resultados de esta tesis se han aplicado a miles de vídeos docentes en universidades e instituciones europeas.Hui en dia, l'àrea d'aprenentatge millorat per la tecnologia ha experimentat un fort creixement, amb molts nous enfocaments d'aprenentatge com l'aprenentatge combinat, la classe inversa, els cursos massius oberts en línia i nous recursos educatius oberts per tal de complementar les classes presencials. En concret, els vídeos docents s'estan convertint ràpidament en un recurs educatiu quotidià en l'educació superior per a tots aquests nous enfocaments d'aprenentatge i estan incorporant-se als plans d'estudi universitari existents arreu del món. Les transcripcions i les traduccions poden millorar la utilitat d'aquests recursos audiovisuals, però rara vegada estan presents a causa de la falta de solucions rendibles per fer-ho. La cerca de i als vídeos, l'accessibilitat a persones amb impediments, la traducció per estudiants estrangers, la detecció de plagi, la recomanació de contingut, la presa de notes i el descobriment de vídeos relacionats són un exemple dels avantatges de la presència de transcripcions. Per aquesta raó, l'objectiu d'aquesta tesi és provar en casos d'estudi de la vida real les formes d'obtenir subtítols multilingües per a vídeos docents d'una manera rendible, mitjançant l'ús de tècniques avançades de reconeixement automàtic de veu i de traducció automàtica. A més a més, s'exploren diferents models d'interacció per a revisar aquestes transcripcions i traduccions automàtiques, puix malauradament els subtítols automàtics no estan lliures d'errades. A més, es fa un pas més en el multilingüisme estenent els nostres descobriments i avaluacions a molts idiomes. Per últim, destacar que els resultats d'aquesta tesi s'han aplicat a milers de vídeos docents en universitats i institucions europees.Valor Miró, JD. (2017). Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90496TESI

    An Investigation Into the Feasibility of Streamlining Language Sample Analysis Through Computer-Automated Transcription and Scoring

    Get PDF
    The purpose of the study was to investigate the feasibility of streamlining the transcription and scoring portion of language sample analysis (LSA) through computer-automation. LSA is a gold-standard procedure for examining childrens’ language abilities that is underutilized by speech language pathologists due to its time-consuming nature. To decrease the time associated with the process, the accuracy of transcripts produced automatically with Google Cloud Speech and the accuracy of scores generated by a hard-coded scoring function called the Literate Language Use in Narrative Analysis (LLUNA) were evaluated. A collection of narrative transcripts and audio recordings of narrative samples were selected to evaluate the accuracy of these automated systems. Samples were previously elicited from school-age children between the ages of 6;0-11;11 who were either typically developing (TD), at-risk for language-related learning disabilities (AR), or had developmental language disorder (DLD). Transcription error of Google Cloud Speech transcripts was evaluated with a weighted word-error rate (WERw). Score accuracy was evaluated with a quadratic weighted kappa (Kqw). Results indicated an average WERw of 48% across all language sample recordings, with a median WERw of 40%. Several recording characteristics of samples were associated with transcription error including the codec used to recorded the audio sample and the presence of background noise. Transcription error was lower on average for samples collected using a lossless codec, that contained no background noise. Scoring accuracy of LLUNA was high across all six measures of literate language when generated from traditionally produced transcripts, regardless of age or language ability (TD, DLD, AR). Adverbs were most variable in their score accuracy. Scoring accuracy dropped when LLUNA generated scores from transcripts produced by Google Cloud Speech, however, LLUNA was more likely to generate accurate scores when transcripts had low to moderate levels of transcription error. This work provides additional support for the use of automated transcription under the right recording conditions and automated scoring of literate language indices. It also provides preliminary support for streamlining the entire LSA process by automating both transcription and scoring, when high quality recordings of language samples are utilized

    Design of a Controlled Language for Critical Infrastructures Protection

    Get PDF
    We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
    corecore