6,559 research outputs found
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
Probabilistic Modelling of Morphologically Rich Languages
This thesis investigates how the sub-structure of words can be accounted for
in probabilistic models of language. Such models play an important role in
natural language processing tasks such as translation or speech recognition,
but often rely on the simplistic assumption that words are opaque symbols. This
assumption does not fit morphologically complex language well, where words can
have rich internal structure and sub-word elements are shared across distinct
word forms.
Our approach is to encode basic notions of morphology into the assumptions of
three different types of language models, with the intention that leveraging
shared sub-word structure can improve model performance and help overcome data
sparsity that arises from morphological processes.
In the context of n-gram language modelling, we formulate a new Bayesian
model that relies on the decomposition of compound words to attain better
smoothing, and we develop a new distributed language model that learns vector
representations of morphemes and leverages them to link together
morphologically related words. In both cases, we show that accounting for word
sub-structure improves the models' intrinsic performance and provides benefits
when applied to other tasks, including machine translation.
We then shift the focus beyond the modelling of word sequences and consider
models that automatically learn what the sub-word elements of a given language
are, given an unannotated list of words. We formulate a novel model that can
learn discontiguous morphemes in addition to the more conventional contiguous
morphemes that most previous models are limited to. This approach is
demonstrated on Semitic languages, and we find that modelling discontiguous
sub-word structures leads to improvements in the task of segmenting words into
their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014.
http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c
- …