1,140 research outputs found
Monte Carlo inference and maximization for phrase-based translation
Recent advances in statistical machine translation have used beam search for approximate NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum expected risk training and decoding.
Maximum Fidelity
The most fundamental problem in statistics is the inference of an unknown
probability distribution from a finite number of samples. For a specific
observed data set, answers to the following questions would be desirable: (1)
Estimation: Which candidate distribution provides the best fit to the observed
data?, (2) Goodness-of-fit: How concordant is this distribution with the
observed data?, and (3) Uncertainty: How concordant are other candidate
distributions with the observed data? A simple unified approach for univariate
data that addresses these traditionally distinct statistical notions is
presented called "maximum fidelity". Maximum fidelity is a strict frequentist
approach that is fundamentally based on model concordance with the observed
data. The fidelity statistic is a general information measure based on the
coordinate-independent cumulative distribution and critical yet previously
neglected symmetry considerations. An approximation for the null distribution
of the fidelity allows its direct conversion to absolute model concordance (p
value). Fidelity maximization allows identification of the most concordant
model distribution, generating a method for parameter estimation, with
neighboring, less concordant distributions providing the "uncertainty" in this
estimate. Maximum fidelity provides an optimal approach for parameter
estimation (superior to maximum likelihood) and a generally optimal approach
for goodness-of-fit assessment of arbitrary models applied to univariate data.
Extensions to binary data, binned data, multidimensional data, and classical
parametric and nonparametric statistical tests are described. Maximum fidelity
provides a philosophically consistent, robust, and seemingly optimal foundation
for statistical inference. All findings are presented in an elementary way to
be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte
- …