6,853 research outputs found
Inducing Features of Random Fields
We present a technique for constructing random fields from a set of training
samples. The learning paradigm builds increasingly complex fields by allowing
potential functions, or features, that are supported by increasingly large
subgraphs. Each feature has a weight that is trained by minimizing the
Kullback-Leibler divergence between the model and the empirical distribution of
the training data. A greedy algorithm determines how features are incrementally
added to the field and an iterative scaling algorithm is used to estimate the
optimal values of the weights.
The statistical modeling techniques introduced in this paper differ from
those common to much of the natural language processing literature since there
is no probabilistic finite state or push-down automaton on which the model is
built. Our approach also differs from the techniques common to the computer
vision literature in that the underlying random fields are non-Markovian and
have a large number of parameters that must be estimated. Relations to other
learning approaches including decision trees and Boltzmann machines are given.
As a demonstration of the method, we describe its application to the problem of
automatic word classification in natural language processing.
Key words: random field, Kullback-Leibler divergence, iterative scaling,
divergence geometry, maximum entropy, EM algorithm, statistical learning,
clustering, word morphology, natural language processingComment: 34 pages, compressed postscrip
Infrared spectra of van de Waals complexes of importance in planetary atmospheres
It has been suggested that (CO2)2 and Ar-CO2 are important constituents of the planetary atmospheres of Venus and Mars. Recent results on the laboratory spectroscopy of CO2 containing van der Waals complexes which may be of use in the modeling of the spectra of planetary atmospheres are presented. Sub-Doppler infrared spectra were obtained for (CO2)2, (CO2)3, and rare-gas-CO2 complexes in the vicinity of the CO2 Fermi diad at 2.7 micrometers using a color-center-laser optothermal spectrometer. From the spectroscopic constants the geometries of the complexes have been determined and van der Waals vibrational frequencies have been estimated. The equilibrium configurations are C2h, C3h, and C2v, for (CO2)2, (CO2)3, and the rare-gas-CO2 complexes, respectively. Most of the homogeneous linewidths for the revibrational transitions range from 0.5 to 22 MHz, indicating that predissociation is as much as four orders of magnitude faster than radiative processes for vibrational relaxation in these complexes
Analytical study of hydrogen turbopump cycles for advanced nuclear rockets Progress report, Sep. 15, 1964 - Sep. 15, 1965
Hydrogen turbopump cycles for obtaining high engine inlet pressures in advanced nuclear rockets, and data on gaseous nuclear reactors and heavy gas containmen
Parametric Fokker-Planck equation
We derive the Fokker-Planck equation on the parametric space. It is the
Wasserstein gradient flow of relative entropy on the statistical manifold. We
pull back the PDE to a finite dimensional ODE on parameter space. Some
analytical example and numerical examples are presented
"How May I Help You?": Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts
Given the increasing popularity of customer service dialogue on Twitter,
analysis of conversation data is essential to understand trends in customer and
agent behavior for the purpose of automating customer service interactions. In
this work, we develop a novel taxonomy of fine-grained "dialogue acts"
frequently observed in customer service, showcasing acts that are more suited
to the domain than the more generic existing taxonomies. Using a sequential
SVM-HMM model, we model conversation flow, predicting the dialogue act of a
given turn in real-time. We characterize differences between customer and agent
behavior in Twitter customer service conversations, and investigate the effect
of testing our system on different customer service industries. Finally, we use
a data-driven approach to predict important conversation outcomes: customer
satisfaction, customer frustration, and overall problem resolution. We show
that the type and location of certain dialogue acts in a conversation have a
significant effect on the probability of desirable and undesirable outcomes,
and present actionable rules based on our findings. The patterns and rules we
derive can be used as guidelines for outcome-driven automated customer service
platforms.Comment: 13 pages, 6 figures, IUI 201
Statistical modelling of the rheological and mucoadhesive properties of aqueous poly(methylvinylether-co-maleic acid) networks: Redefining biomedical applications and the relationship between viscoelasticity and mucoadhesion
Indication for scattering in collisions at 200 GeV
A mass shift of about -40 MeV/ was measured in
collisions at 200 GeV at RHIC. Previous mass shifts have
been observed at CERN-LEBC-EHS and CERN-LEP. We will show that phase space does
not account for the mass shift measured at RHIC, CERN-LEBC-EHS
and CERN-LEP and conclude that there are significant scattering interactions in
collisions.Comment: 11 pages and 7 figure
Probabilistic models of information retrieval based on measuring the divergence from randomness
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model
- …