19,022 research outputs found
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Induction of Word and Phrase Alignments for Automatic Document Summarization
Current research in automatic single document summarization is dominated by
two effective, yet naive approaches: summarization by sentence extraction, and
headline generation via bag-of-words models. While successful in some tasks,
neither of these models is able to adequately capture the large set of
linguistic devices utilized by humans when they produce summaries. One possible
explanation for the widespread use of these models is that good techniques have
been developed to extract appropriate training data for them from existing
document/abstract and document/headline corpora. We believe that future
progress in automatic summarization will be driven both by the development of
more sophisticated, linguistically informed models, as well as a more effective
leveraging of document/abstract corpora. In order to open the doors to
simultaneously achieving both of these goals, we have developed techniques for
automatically producing word-to-word and phrase-to-phrase alignments between
documents and their human-written abstracts. These alignments make explicit the
correspondences that exist in such document/abstract pairs, and create a
potentially rich data source from which complex summarization algorithms may
learn. This paper describes experiments we have carried out to analyze the
ability of humans to perform such alignments, and based on these analyses, we
describe experiments for creating them automatically. Our model for the
alignment task is based on an extension of the standard hidden Markov model,
and learns to create alignments in a completely unsupervised fashion. We
describe our model in detail and present experimental results that show that
our model is able to learn to reliably identify word- and phrase-level
alignments in a corpus of pairs
AMaĻoSāAbstract Machine for Xcerpt
Web query languages promise convenient and efficient access
to Web data such as XML, RDF, or Topic Maps. Xcerpt is one such Web
query language with strong emphasis on novel high-level constructs for
effective and convenient query authoring, particularly tailored to versatile
access to data in different Web formats such as XML or RDF.
However, so far it lacks an efficient implementation to supplement the
convenient language features. AMaĻoS is an abstract machine implementation
for Xcerpt that aims at efficiency and ease of deployment. It
strictly separates compilation and execution of queries: Queries are compiled
once to abstract machine code that consists in (1) a code segment
with instructions for evaluating each rule and (2) a hint segment that
provides the abstract machine with optimization hints derived by the
query compilation. This article summarizes the motivation and principles
behind AMaĻoS and discusses how its current architecture realizes
these principles
- ā¦