21,846 research outputs found
Translating near-synonyms: Possibilities and preferences in the interlingua
This paper argues that an interlingual representation must explicitly
represent some parts of the meaning of a situation as possibilities (or
preferences), not as necessary or definite components of meaning (or
constraints). Possibilities enable the analysis and generation of nuance,
something required for faithful translation. Furthermore, the representation of
the meaning of words, especially of near-synonyms, is crucial, because it
specifies which nuances words can convey in which contexts.Comment: 8 pages, LaTeX2e, 1 eps figure, uses colacl.sty, epsfig.sty, avm.sty,
times.st
Induction of Word and Phrase Alignments for Automatic Document Summarization
Current research in automatic single document summarization is dominated by
two effective, yet naive approaches: summarization by sentence extraction, and
headline generation via bag-of-words models. While successful in some tasks,
neither of these models is able to adequately capture the large set of
linguistic devices utilized by humans when they produce summaries. One possible
explanation for the widespread use of these models is that good techniques have
been developed to extract appropriate training data for them from existing
document/abstract and document/headline corpora. We believe that future
progress in automatic summarization will be driven both by the development of
more sophisticated, linguistically informed models, as well as a more effective
leveraging of document/abstract corpora. In order to open the doors to
simultaneously achieving both of these goals, we have developed techniques for
automatically producing word-to-word and phrase-to-phrase alignments between
documents and their human-written abstracts. These alignments make explicit the
correspondences that exist in such document/abstract pairs, and create a
potentially rich data source from which complex summarization algorithms may
learn. This paper describes experiments we have carried out to analyze the
ability of humans to perform such alignments, and based on these analyses, we
describe experiments for creating them automatically. Our model for the
alignment task is based on an extension of the standard hidden Markov model,
and learns to create alignments in a completely unsupervised fashion. We
describe our model in detail and present experimental results that show that
our model is able to learn to reliably identify word- and phrase-level
alignments in a corpus of pairs
Unsupervised Controllable Text Formalization
We propose a novel framework for controllable natural language
transformation. Realizing that the requirement of parallel corpus is
practically unsustainable for controllable generation tasks, an unsupervised
training scheme is introduced. The crux of the framework is a deep neural
encoder-decoder that is reinforced with text-transformation knowledge through
auxiliary modules (called scorers). The scorers, based on off-the-shelf
language processing tools, decide the learning scheme of the encoder-decoder
based on its actions. We apply this framework for the text-transformation task
of formalizing an input text by improving its readability grade; the degree of
required formalization can be controlled by the user at run-time. Experiments
on public datasets demonstrate the efficacy of our model towards: (a)
transforming a given text to a more formal style, and (b) introducing
appropriate amount of formalness in the output text pertaining to the input
control. Our code and datasets are released for academic use.Comment: AAA
Seeding statistical machine translation with translation memory output through tree-based structural alignment
With the steadily increasing demand for high-quality translation, the localisation industry is constantly searching for technologies that would increase translator
throughput, with the current focus on the use of high-quality Statistical Machine Translation (SMT) as a supplement to the established Translation Memory (TM)
technology. In this paper we present a novel modular approach that utilises state-of-the-art sub-tree alignment to pick out pre-translated segments from a TM match and seed with them an SMT system to produce a final translation. We show that the presented system can outperform pure SMT when a good TM match is found. It can also be used in a Computer-Aided Translation (CAT) environment to present almost perfect translations to the human user with markup highlighting the segments of the translation that need to be checked manually for correctness
- …