26,460 research outputs found
Dependency relations as source context in phrase-based SMT
The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical
choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and
supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a Dutch—English translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline
Storage of Natural Language Sentences in a Hopfield Network
This paper look at how the Hopfield neural network can be used to store and
recall patterns constructed from natural language sentences. As a pattern
recognition and storage tool, the Hopfield neural network has received much
attention. This attention however has been mainly in the field of statistical
physics due to the model's simple abstraction of spin glass systems. A
discussion is made of the differences, shown as bias and correlation, between
natural language sentence patterns and the randomly generated ones used in
previous experiments. Results are given for numerical simulations which show
the auto-associative competence of the network when trained with natural
language patterns.Comment: latex, 10 pages with 2 tex figures and a .bib file, uses nemlap.sty,
to appear in Proceedings of NeMLaP-
A Neural Attention Model for Abstractive Sentence Summarization
Summarization based on text extraction is inherently limited, but
generation-style abstractive methods have proven challenging to build. In this
work, we propose a fully data-driven approach to abstractive sentence
summarization. Our method utilizes a local attention-based model that generates
each word of the summary conditioned on the input sentence. While the model is
structurally simple, it can easily be trained end-to-end and scales to a large
amount of training data. The model shows significant performance gains on the
DUC-2004 shared task compared with several strong baselines.Comment: Proceedings of EMNLP 201
Linguistic Geometries for Unsupervised Dimensionality Reduction
Text documents are complex high dimensional objects. To effectively visualize
such data it is important to reduce its dimensionality and visualize the low
dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore
dimensionality reduction methods that draw upon domain knowledge in order to
achieve a better low dimensional embedding and visualization of documents. We
consider the use of geometries specified manually by an expert, geometries
derived automatically from corpus statistics, and geometries computed from
linguistic resources.Comment: 13 pages, 15 figure
- …