Search CORE

15,720 research outputs found

Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing

Author: Kübler Sandra
Piskorski Jakub
Przepiorkowski Adam
Publication venue
Publication date: 03/11/2008
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

A Novel Method for Movie Character Identification Based on Graph Matching A Survey

Author: Mr. B. S. Salve, Prof. S. A. Shinde
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2014
Field of study

International Journal on Recent and Innovation Trends in Computing and Communication

Automatic Framework to Aid Therapists to Diagnose Children who Stutter

Author: Alharbi Sadeen
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/10/2018
Field of study

White Rose E-theses Online

Recommended from our members

Draco 1.3 users manual

Author: Arango Guillermo
Leite Julio C.
Neighbors James M.
Publication venue: eScholarship, University of California
Publication date: 30/10/1984
Field of study

eScholarship - University of California

A data-driven approach to spoken dialog segmentation

Author: Callejas Carrión Zoraida
Griol Barres David
Molina López José Manuel
Sanchis de Miguel María Araceli
Publication venue: 'Elsevier BV'
Publication date: 28/05/2020
Field of study

In This Paper, We Present A Statistical Model For Spoken Dialog Segmentation That Decides The Current Phase Of The Dialog By Means Of An Automatic Classification Process. We Have Applied Our Proposal To Three Practical Conversational Systems Acting In Different Domains. The Results Of The Evaluation Show That Is Possible To Attain High Accuracy Rates In Dialog Segmentation When Using Different Sources Of Information To Represent The User Input. Our Results Indicate How The Module Proposed Can Also Improve Dialog Management By Selecting Better System Answers. The Statistical Model Developed With Human-Machine Dialog Corpora Has Been Applied In One Of Our Experiments To Human-Human Conversations And Provides A Good Baseline As Well As Insights In The Model Limitation

Universidad Carlos III de Madrid e-Archivo

Factoid question answering for spoken documents

Author: Comas Umbert Pere R.
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

In this dissertation, we present a factoid question answering system, specifically tailored for Question Answering (QA) on spoken documents. This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken documents scenario. More specifically, we study new information retrieval (IR) techniques designed for speech, and utilize several levels of linguistic information for the speech-based QA task. These include named-entity detection with phonetic information, syntactic parsing applied to speech transcripts, and the use of coreference resolution. Our approach is largely based on supervised machine learning techniques, with special focus on the answer extraction step, and makes little use of handcrafted knowledge. Consequently, it should be easily adaptable to other domains and languages. In the work resulting of this Thesis, we have impulsed and coordinated the creation of an evaluation framework for the task of QA on spoken documents. The framework, named QAst, provides multi-lingual corpora, evaluation questions, and answers key. These corpora have been used in the QAst evaluation that was held in the CLEF workshop for the years 2007, 2008 and 2009, thus helping the developing of state-of-the-art techniques for this particular topic. The presentend QA system and all its modules are extensively evaluated on the European Parliament Plenary Sessions English corpus composed of manual transcripts and automatic transcripts obtained by three different Automatic Speech Recognition (ASR) systems that exhibit significantly different word error rates. This data belongs to the CLEF 2009 track for QA on speech transcripts. The main results confirm that syntactic information is very useful for learning to rank question candidates, improving results on both manual and automatic transcripts unless the ASR quality is very low. Overall, the performance of our system is comparable or better than the state-of-the-art on this corpus, confirming the validity of our approach.En aquesta Tesi, presentem un sistema de Question Answering (QA) factual, especialment ajustat per treballar amb documents orals. En el desenvolupament explorem, per primera vegada, quines tècniques de les habitualment emprades en QA per documents escrit són suficientment robustes per funcionar en l'escenari més difícil de documents orals. Amb més especificitat, estudiem nous mètodes de Information Retrieval (IR) dissenyats per tractar amb la veu, i utilitzem diversos nivells d'informació linqüística. Entre aquests s'inclouen, a saber: detecció de Named Entities utilitzant informació fonètica, "parsing" sintàctic aplicat a transcripcions de veu, i també l'ús d'un sub-sistema de detecció i resolució de la correferència. La nostra aproximació al problema es recolza en gran part en tècniques supervisades de Machine Learning, estant aquestes enfocades especialment cap a la part d'extracció de la resposta, i fa servir la menor quantitat possible de coneixement creat per humans. En conseqüència, tot el procés de QA pot ser adaptat a altres dominis o altres llengües amb relativa facilitat. Un dels resultats addicionals de la feina darrere d'aquesta Tesis ha estat que hem impulsat i coordinat la creació d'un marc d'avaluació de la taska de QA en documents orals. Aquest marc de treball, anomenat QAst (Question Answering on Speech Transcripts), proporciona un corpus de documents orals multi-lingüe, uns conjunts de preguntes d'avaluació, i les respostes correctes d'aquestes. Aquestes dades han estat utilitzades en les evaluacionis QAst que han tingut lloc en el si de les conferències CLEF en els anys 2007, 2008 i 2009; d'aquesta manera s'ha promogut i ajudat a la creació d'un estat-de-l'art de tècniques adreçades a aquest problema en particular. El sistema de QA que presentem i tots els seus particulars sumbòduls, han estat avaluats extensivament utilitzant el corpus EPPS (transcripcions de les Sessions Plenaries del Parlament Europeu) en anglès, que cónté transcripcions manuals de tots els discursos i també transcripcions automàtiques obtingudes mitjançant tres reconeixedors automàtics de la parla (ASR) diferents. Els reconeixedors tenen característiques i resultats diferents que permetes una avaluació quantitativa i qualitativa de la tasca. Aquestes dades pertanyen a l'avaluació QAst del 2009. Els resultats principals de la nostra feina confirmen que la informació sintàctica és mol útil per aprendre automàticament a valorar la plausibilitat de les respostes candidates, millorant els resultats previs tan en transcripcions manuals com transcripcions automàtiques, descomptat que la qualitat de l'ASR sigui molt baixa. En general, el rendiment del nostre sistema és comparable o millor que els altres sistemes pertanyents a l'estat-del'art, confirmant així la validesa de la nostra aproximació

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Automatic Transcription of English and German Qualitative Interviews

Author: Hoffmann Markus
Höfting Jonas
Ventzke Carla
Wollin-Giering Susanne
Publication venue: DEU
Publication date: 01/01/2024
Field of study

Recording and transcribing interviews in qualitative social research is a vital but time-consuming and resource-intensive task. To tackle this challenge, researchers have explored various alternative approaches; automatic transcription utilising speech recognition algorithms has emerged as a promising solution. The question of whether automated transcripts can match the quality of transcripts produced by humans remains unanswered. In this paper we systematically compare multiple automatic transcription tools: Amberscript, Dragon, F4x, Happy Scribe, NVivo, Sonix, Trint, Otter, and Whisper. We evaluate aspects of data protection, accuracy, time efficiency, and costs for an English and a German interview. Based on the analysis, we conclude that Whisper performs best overall and that similar local-automatic transcription tools are likely to become more relevant. For any type of transcription, we recommend reviewing the text to ensure accuracy. We hope to shed light on the effectiveness of automatic transcription services and provide a comparative frame for others interested in automatic transcription.Die Aufnahme und Transkription von Interviews sind zentrale, aber ressourcen- und zeitintensive Schritte in der qualitativen Sozialforschung. In Bezug darauf haben Forscher*innen verschiedene Alternativen vorgeschlagen, unter anderem automatische Transkription mithilfe von Spracherkennungsalgorithmen. Ob solche automatisch erstellten Transkripte die Qualität von durch Menschen erstellte Transkripte erreichen, ist noch unklar. In diesem Beitrag vergleichen wir systematisch mehrere automatische Transkriptionswerkzeuge: Amberscript, Dragon, F4x, Happy Scribe, NVivo, Sonix, Trint, Otter und Whisper. Wir bewerten Aspekte des Datenschutzes, der Genauigkeit, der nötigen Zeit und der Kosten anhand eines englischsprachigen und eines deutschsprachigen Interviews. Unsere Analyse ergibt, dass Whisper insgesamt am besten abschneidet und dass ähnliche lokal-automatische Transkriptionswerkzeuge in der Zukunft wahrscheinlich relevanter werden. Um höchstmögliche Genauigkeit zu erreichen, empfehlen wir ein Durchlesen des Texts für jegliche Art der Transkription. Wir stellen einen Rahmen zum Vergleich von Transkriptionswerkzeugen bereit und hoffen, einen Beitrag zur Diskussion der Brauchbarkeit von automatischer Transkription leisten zu können

SSOAR - Social Science Open Access Repository

Predicting Communication Rates: Efficacy of a Scanning Model

Author: Mankowski Robert E.
Publication venue
Publication date: 10/09/2009
Field of study

Interaction with the surrounding environment is an essential element of ever day life. For individuals' with severe motor and communicative disabilities, single switch scanning is used as method to control their environment and communicate. Despite being very slow, it is often the only option for individuals who cannot use other interfaces. The alteration of timing parameters and scanning system configurations impacts the communication rate of those using single switch scanning. The ability to select and recommend an efficient configuration for an individual with a disability is essential. Predictive models could assist in the goal of achieving the best possible match between user and assistive technology device, but consideration of an individual's single switch scanning tendencies has not been included in communication rate prediction models. Modeling software developed as part of this research study utilizes scan settings, switch settings, error tendencies, error correction strategies, and the matrix configuration to calculate and predict a communication rate. Five participants with disabilities who use single switch scanning were recruited for this study. Participants were asked to transcribe sentences using an on-screen keyboard configured with settings used on their own communication devices. The participant's error types, frequencies, and correction methods were acquired as well as their text entry rate (TER) during sentence transcription. These individual tendencies and system configuration were used as baseline input parameters to a scanning model application that calculated a TER based upon those parameters. The scanning model was used with the participant's tendencies and at least three varied system configurations. Participants were asked to transcribe sentences with these three configurations The predicted TERs of the model were compared to the actual TERs observed during sentence transcription for accuracy. Results showed that prediction were 90% accurate on average. Model TER predictions were less than one character per minute different from observed baseline TER for each participant. Average model predictions for configuration scenarios were less than one character per minute different from observed configuration TER

D-Scholarship@Pitt