5,495 research outputs found
An exploratory study into automated précis grading
Automated writing evaluation is a popular research field, but the main focus has been on evaluating argumentative essays. In this paper, we consider a different genre, namely précis texts. A précis is a written text that provides a coherent summary of main points of a spoken or written text. We present a corpus of English précis texts which all received a grade assigned by a highly-experienced English language teacher and were subsequently annotated following an exhaustive error typology. With this corpus we trained a machine learning model which relies on a number of linguistic, automatic summarization and AWE features. Our results reveal that this model is able to predict the grade of précis texts with only a moderate error margin
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction,
topic detection, or browsing/playback is to segment the input into sentence and
topic units. Speech segmentation is challenging, since the cues typically
present for segmenting text (headers, paragraphs, punctuation) are absent in
spoken language. We investigate the use of prosody (information gleaned from
the timing and melody of speech) for these tasks. Using decision tree and
hidden Markov modeling techniques, we combine prosodic cues with word-based
approaches, and evaluate performance on two speech corpora, Broadcast News and
Switchboard. Results show that the prosodic model alone performs on par with,
or better than, word-based statistical language models -- for both true and
automatically recognized words in news speech. The prosodic model achieves
comparable performance with significantly less training data, and requires no
hand-labeling of prosodic events. Across tasks and corpora, we obtain a
significant improvement over word-only models using a probabilistic combination
of prosodic and lexical information. Inspection reveals that the prosodic
models capture language-independent boundary indicators described in the
literature. Finally, cue usage is task and corpus dependent. For example, pause
and pitch features are highly informative for segmenting news speech, whereas
pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2),
Special Issue on Accessing Information in Spoken Audio, September 200
Mediated discourse at the European Parliament: Empirical investigations
The purpose of this book is to showcase a diverse set of directions in empirical research on mediated discourse, reflecting on the state-of-the-art and the increasing intersection between Corpus-based Interpreting Studies (CBIS) and Corpus-based Translation Studies (CBTS). Undeniably, data from the European Parliament (EP) offer a great opportunity for such research. Not only does the institution provide a sizeable sample of oral debates held at the EP together with their simultaneous interpretations into all languages of the European Union. It also makes available written verbatim reports of the original speeches, which used to be translated. From a methodological perspective, EP materials thus guarantee a great degree of homogeneity, which is particularly valuable in corpus studies, where data comparability is frequently a challenge.
In this volume, progress is visible in both CBIS and CBTS. In interpreting, it manifests itself notably in the availability of comprehensive transcription, annotation and alignment systems. In translation, datasets are becoming substantially richer in metadata, which allow for increasingly refined multi-factorial analysis. At the crossroads between the two fields, intermodal investigations bring to the fore what these mediation modes have in common and how they differ. The volume is thus aimed in particular at Interpreting and Translation scholars looking for new descriptive insights and methodological approaches in the investigation of mediated discourse, but it may be also of interest for (corpus) linguists analysing parliamentary discourse in general
Empirical investigations
The purpose of this book is to showcase a diverse set of directions in empirical research on mediated discourse, reflecting on the state-of-the-art and the increasing intersection between Corpus-based Interpreting Studies (CBIS) and Corpus-based Translation Studies (CBTS). Undeniably, data from the European Parliament (EP) offer a great opportunity for such research. Not only does the institution provide a sizeable sample of oral debates held at the EP together with their simultaneous interpretations into all languages of the European Union. It also makes available written verbatim reports of the original speeches, which used to be translated. From a methodological perspective, EP materials thus guarantee a great degree of homogeneity, which is particularly valuable in corpus studies, where data comparability is frequently a challenge.
In this volume, progress is visible in both CBIS and CBTS. In interpreting, it manifests itself notably in the availability of comprehensive transcription, annotation and alignment systems. In translation, datasets are becoming substantially richer in metadata, which allow for increasingly refined multi-factorial analysis. At the crossroads between the two fields, intermodal investigations bring to the fore what these mediation modes have in common and how they differ. The volume is thus aimed in particular at Interpreting and Translation scholars looking for new descriptive insights and methodological approaches in the investigation of mediated discourse, but it may be also of interest for (corpus) linguists analysing parliamentary discourse in general
Using term clouds to represent segment-level semantic content of podcasts
Spoken audio, like any time-continuous medium, is notoriously difficult to browse or skim without support of an interface providing semantically annotated jump points to signal the user where to listen in. Creation of time-aligned metadata by human annotators is prohibitively expensive, motivating the investigation of representations of segment-level semantic content based on transcripts
generated by automatic speech recognition (ASR). This paper
examines the feasibility of using term clouds to provide users with a structured representation of the semantic content of podcast episodes. Podcast episodes are visualized as a series of sub-episode segments, each represented by a term cloud derived from a transcript
generated by automatic speech recognition (ASR). Quality of
segment-level term clouds is measured quantitatively and their utility is investigated using a small-scale user study based on human labeled segment boundaries. Since the segment-level clouds generated from ASR-transcripts prove useful, we examine an adaptation of text tiling techniques to speech in order to be able to generate segments as part of a completely automated indexing and structuring system for browsing of spoken audio. Results demonstrate that the segments generated are comparable with human selected segment boundaries
Error analysis in automatic speech recognition and machine translation
Automatic speech recognition and machine translation are well-known terms in
the translation world nowadays. Systems that carry out these processes are taking over the work
of humans more and more. Reasons for this are the speed at which the tasks are performed and
their costs. However, the quality of these systems is debatable. They are not yet capable of
delivering the same performance as human transcribers or translators. The lack of creativity,
the ability to interpret texts and the sense of language is often cited as the reason why the
performance of machines is not yet at the level of human translation or transcribing work.
Despite this, there are companies that use these machines in their production pipelines.
Unbabel, an online translation platform powered by artificial intelligence, is one of these
companies. Through a combination of human translators and machines, Unbabel tries to
provide its customers with a translation of good quality. This internship report was written with
the aim of gaining an overview of the performance of these systems and the errors they produce.
Based on this work, we try to get a picture of possible error patterns produced by both systems.
The present work consists of an extensive analysis of errors produced by automatic speech
recognition and machine translation systems after automatically transcribing and translating 10
English videos into Dutch. Different videos were deliberately chosen to see if there were
significant differences in the error patterns between videos. The generated data and results from
this work, aims at providing possible ways to improve the quality of the services already
mentioned.O reconhecimento automático de fala e a tradução automática são termos conhecidos
no mundo da tradução, hoje em dia. Os sistemas que realizam esses processos estão a assumir
cada vez mais o trabalho dos humanos. As razões para isso são a velocidade com que as tarefas
são realizadas e os seus custos. No entanto, a qualidade desses sistemas é discutível. As
máquinas ainda não são capazes de ter o mesmo desempenho dos transcritores ou tradutores
humanos. A falta de criatividade, de capacidade de interpretar textos e de sensibilidade
linguística são motivos frequentemente usados para justificar o facto de as máquinas ainda não
estarem suficientemente desenvolvidas para terem um desempenho comparável com o trabalho
de tradução ou transcrição humano. Mesmo assim, existem empresas que fazem uso dessas
máquinas. A Unbabel, uma plataforma de tradução online baseada em inteligência artificial, é
uma dessas empresas. Através de uma combinação de tradutores humanos e de máquinas, a
Unbabel procura oferecer aos seus clientes traduções de boa qualidade. O presente relatório de
estágio foi feito com o intuito de obter uma visão geral do desempenho desses sistemas e das
falhas que cometem, propondo delinear uma imagem dos possíveis padrões de erro existentes
nos mesmos. Para tal, fez-se uma análise extensa das falhas que os sistemas de reconhecimento
automático de fala e de tradução automática cometeram, após a transcrição e a tradução
automática de 10 vídeos. Foram deliberadamente escolhidos registos videográficos diversos,
de modo a verificar possíveis diferenças nos padrões de erro. Através dos dados gerados e dos
resultados obtidos, propõe-se encontrar uma forma de melhorar a qualidade dos serviços já
mencionados
Quantified language connectedness in schizophrenia-spectrum disorders
Language abnormalities are a core symptom of schizophrenia-spectrum disorders and could serve as a potential diagnostic marker. Natural language processing enables quantification of language connectedness, which may be lower in schizophrenia-spectrum disorders. Here, we investigated connectedness of spontaneous speech in schizophrenia-spectrum patients and controls and determine its accuracy in classification. Using a semi-structured interview, speech of 50 patients with a schizophrenia-spectrum disorder and 50 controls was recorded. Language connectedness in a semantic word2vec model was calculated using consecutive word similarity in moving windows of increasing sizes (2-20 words). Mean, minimal and variance of similarity were calculated per window size and used in a random forest classifier to distinguish patients and healthy controls. Classification based on connectedness reached 85% cross-validated accuracy, with 84% specificity and 86% sensitivity. Features that best discriminated patients from controls were variance of similarity at window sizes between 5 and 10. We show impaired connectedness in spontaneous speech of patients with schizophrenia-spectrum disorders even in patients with low ratings of positive symptoms. Effects were most prominent at the level of sentence connectedness. The high sensitivity, specificity and tolerability of this method show that language analysis is an accurate and feasible digital assistant in diagnosing schizophrenia-spectrum disorders
- …