23,245 research outputs found
Exploiting source similarity for SMT using context-informed features
In this paper, we introduce context informed features in a log-linear phrase-based SMT framework; these features enable us to exploit source similarity in addition to target similarity modeled by the language model. We
present a memory-based classification framework that enables the estimation of these features while avoiding
sparseness problems. We evaluate the performance of our approach on Italian-to-English and Chinese-to-English translation tasks using a state-of-the-art phrase-based SMT
system, and report significant improvements for both BLEU and NIST scores when adding the context-informed features
Predictive User Modeling with Actionable Attributes
Different machine learning techniques have been proposed and used for
modeling individual and group user needs, interests and preferences. In the
traditional predictive modeling instances are described by observable
variables, called attributes. The goal is to learn a model for predicting the
target variable for unseen instances. For example, for marketing purposes a
company consider profiling a new user based on her observed web browsing
behavior, referral keywords or other relevant information. In many real world
applications the values of some attributes are not only observable, but can be
actively decided by a decision maker. Furthermore, in some of such applications
the decision maker is interested not only to generate accurate predictions, but
to maximize the probability of the desired outcome. For example, a direct
marketing manager can choose which type of a special offer to send to a client
(actionable attribute), hoping that the right choice will result in a positive
response with a higher probability. We study how to learn to choose the value
of an actionable attribute in order to maximize the probability of a desired
outcome in predictive modeling. We emphasize that not all instances are equally
sensitive to changes in actions. Accurate choice of an action is critical for
those instances, which are on the borderline (e.g. users who do not have a
strong opinion one way or the other). We formulate three supervised learning
approaches for learning to select the value of an actionable attribute at an
instance level. We also introduce a focused training procedure which puts more
emphasis on the situations where varying the action is the most likely to take
the effect. The proof of concept experimental validation on two real-world case
studies in web analytics and e-learning domains highlights the potential of the
proposed approaches
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Linear and Order Statistics Combiners for Pattern Classification
Several researchers have experimentally shown that substantial improvements
can be obtained in difficult pattern recognition problems by combining or
integrating the outputs of multiple classifiers. This chapter provides an
analytical framework to quantify the improvements in classification results due
to combining. The results apply to both linear combiners and order statistics
combiners. We first show that to a first order approximation, the error rate
obtained over and above the Bayes error rate, is directly proportional to the
variance of the actual decision boundaries around the Bayes optimum boundary.
Combining classifiers in output space reduces this variance, and hence reduces
the "added" error. If N unbiased classifiers are combined by simple averaging,
the added error rate can be reduced by a factor of N if the individual errors
in approximating the decision boundaries are uncorrelated. Expressions are then
derived for linear combiners which are biased or correlated, and the effect of
output correlations on ensemble performance is quantified. For order statistics
based non-linear combiners, we derive expressions that indicate how much the
median, the maximum and in general the ith order statistic can improve
classifier performance. The analysis presented here facilitates the
understanding of the relationships among error rates, classifier boundary
distributions, and combining in output space. Experimental results on several
public domain data sets are provided to illustrate the benefits of combining
and to support the analytical results.Comment: 31 page
An analysis of spending behaviour under liquidity constraints with an application to financial hedging
Imperial Users onl
Bayesian Learning for a Class of Priors with Prescribed Marginals
We present Bayesian updating of an imprecise probability measure, represented by a class of precise multidimensional probability measures. Choice and analysis of our class are motivated by expert interviews that we conducted with modelers in the context of climatic change. From the interviews we deduce that generically, experts hold a much more informed opinion on the marginals of uncertain parameters rather than on their correlations. Accordingly, we specify the class by prescribing precise measures for the marginals while letting the correlation structure subject to complete ignorance. For sake of transparency, our discussion focuses on the tutorial example of a linear two-dimensional Gaussian model. We operationalize Bayesian learning for that class by various updating rules, starting with (a modified version of) the generalized Bayes' rule and the maximum likelihood update rule (after Gilboa and Schmeidler). Over a large range of potential observations, the generalized Bayes' rule would provide non-informative results. We restrict this counter-intuitive and unnecessary growth of uncertainty by two means, the discussion of which refers to any kind of imprecise model, not only to our class. First, we find our class of priors too inclusive and, hence, require certain additional properties of prior measures in terms of smoothness of probability density functions. Second, we argue that both updating rules are dissatisfying, the generalized Bayes' rule being too conservative, i.e., too inclusive, the maximum likelihood rule being too exclusive. Instead, we introduce two new ways of Bayesian updating of imprecise probabilities: a ``weighted maximum likelihood method'' and a ``semi-classical method.'' The former bases Bayesian updating on the whole set of priors, however, with weighted influence of its members. By referring to the whole set, the weighted maximum likelihood method allows for more robust inferences than the standard maximum likelihood method and, hence, is better to justify than the latter.Furthermore, the semi-classical method is more objective than the weighted maximum likelihood method as it does not require the subjective definition of a weighting function. Both new methods reveal much more informative results than the generalized Bayes' rule, what we demonstrate for the example of a stylized insurance model
- …