224 research outputs found
Lexical Features for Statistical Machine Translation
In modern phrasal and hierarchical statistical machine translation systems, two major features model translation: rule translation probabilities and lexical smoothing scores. The rule translation probabilities are computed as maximum likelihood estimates (MLEs) of an entire source (or target) phrase translating to a target (or source) phrase. The lexical smoothing scores are also a likelihood estimate of a source (target) phrase translating to a target (source) phrase, but they are computed using independent word-to-word translation probabilities. Intuitively, it would seem that the lexical smoothing score is a less powerful estimate of translation likelihood due to this independence assumption, but I present the somewhat surprising result that lexical smoothing is far more important to the quality of a state-of-the-art hierarchical SMT system than rule translation probabilities. I posit that this is due to a fundamental data sparsity problem: The average word-to-word translation is seen many more times than the average phrase-to-phrase translation, so the word-to-word translation probabilities (or lexical probabilities) are far better estimated.
Motivated by this result, I present a number of novel methods for modifying the lexical probabilities to improve the quality of our MT output. First, I examine two methods of lexical probability biasing, where for each test document, a set of secondary lexical probabilities are extracted and interpolated with the primary lexical probability distribution. Biasing each document with the probabilities extracted from its own first-pass decoding output provides a small but consistent gain of about 0.4 BLEU.
Second, I contextualize the lexical probabilities by factoring in additional information such as the previous or next word. The key to the success of this context-dependent lexical smoothing is a backoff model, where our "trust" of a context-dependent probability estimation is directly proportional to how many times it was seen in the training. In this way, I avoid the estimation problem seen in translation rules, where the amount of context is high but the probability estimation is inaccurate. When using the surrounding words as context, this feature provides a gain of about 0.6 BLEU on Arabic and Chinese.
Finally, I describe several types of discriminatively trained lexical features, along with a new optimization procedure called Expected-BLEU optimization. This new optimization procedure is able to robustly estimate weights for thousands of decoding features, which can in effect discriminatively optimize a set of lexical probabilities to maximize BLEU. I also describe two other discriminative feature types, one of which is the part-of-speech analogue to lexical probabilities, and the other of which estimates training corpus weights based on lexical translations. The discriminative features produce a gain of 0.8 BLEU on Arabic and 0.4 BLEU on Chinese
Complementary Roles of Inference and Language Models in QA
Answering open-domain questions through unsupervised methods poses challenges for both machine-reading (MR) and language model (LM) -based approaches. The MR-based approach suffers from sparsity issues in extracted knowledge graphs (KGs), while the performance of the LM-based approach significantly depends on the quality of the retrieved context for questions. In this paper, we compare these approaches and propose a novel methodology that leverages directional predicate entailment (inference) to address these limitations. We use entailment graphs (EGs), with natural language predicates as nodes and entailment as edges, to enhance parsed KGs by inferring unseen assertions, effectively mitigating the sparsity problem in the MR-based approach. We also show EGs improve context retrieval for the LM-based approach. Additionally, we present a Boolean QA task, demonstrating that EGs exhibit comparable directional inference capabilities to large language models (LLMs). Our results highlight the importance of inference in open-domain QA and the improvements brought by leveraging EGs
PARSING AND TAGGING OF BINLINGUAL DICTIONARY
Bilingual dictionaries hold great potential as a source of lexical resources
for training and testing automated systems for optical character
recognition, machine translation, and cross-language information
retrieval. In this paper, we describe a system for extracting term
lexicons from printed bilingual dictionaries. Our work was divided into
three phases - dictionary segmentation, entry tagging, and generation. In
segmentation, pages are divided into logical entries based on structural
features learned from selected examples. The extracted entries are
associated with functional labels and passed to a tagging module which
associates linguistic labels with each word or phrase in the entry. The
output of the system is a structure that represents the entries from the
dictionary. We have used this approach to parse a variety of dictionaries
with both Latin and non-Latin alphabets, and demonstrate the results of
term lexicon generation for retrieval from a collection of French news
stories using English queries.
(LAMP-TR-106)
(CAR-TR-991)
(UMIACS-TR-2003-97
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
Open-domain Question Answering models which directly leverage question-answer
(QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show
promise in terms of speed and memory compared to conventional models which
retrieve and read from text corpora. QA-pair retrievers also offer
interpretable answers, a high degree of control, and are trivial to update at
test time with new knowledge. However, these models lack the accuracy of
retrieve-and-read systems, as substantially less knowledge is covered by the
available QA-pairs relative to text corpora like Wikipedia. To facilitate
improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very
large resource of 65M automatically-generated QA-pairs. We introduce a new
QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and
caches test questions, enabling RePAQ to match the accuracy of recent
retrieve-and-read models, whilst being significantly faster. Using PAQ, we
train CBQA models which outperform comparable baselines by 5%, but trail RePAQ
by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be
configured for size (under 500MB) or speed (over 1K questions per second)
whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at
selective QA, abstaining from answering when it is likely to be incorrect. This
enables RePAQ to ``back-off" to a more expensive state-of-the-art model,
leading to a combined system which is both more accurate and 2x faster than the
state-of-the-art model alone
- …