6,853 research outputs found

    Inducing Features of Random Fields

    Full text link
    We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The statistical modeling techniques introduced in this paper differ from those common to much of the natural language processing literature since there is no probabilistic finite state or push-down automaton on which the model is built. Our approach also differs from the techniques common to the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated. Relations to other learning approaches including decision trees and Boltzmann machines are given. As a demonstration of the method, we describe its application to the problem of automatic word classification in natural language processing. Key words: random field, Kullback-Leibler divergence, iterative scaling, divergence geometry, maximum entropy, EM algorithm, statistical learning, clustering, word morphology, natural language processingComment: 34 pages, compressed postscrip

    Infrared spectra of van de Waals complexes of importance in planetary atmospheres

    Get PDF
    It has been suggested that (CO2)2 and Ar-CO2 are important constituents of the planetary atmospheres of Venus and Mars. Recent results on the laboratory spectroscopy of CO2 containing van der Waals complexes which may be of use in the modeling of the spectra of planetary atmospheres are presented. Sub-Doppler infrared spectra were obtained for (CO2)2, (CO2)3, and rare-gas-CO2 complexes in the vicinity of the CO2 Fermi diad at 2.7 micrometers using a color-center-laser optothermal spectrometer. From the spectroscopic constants the geometries of the complexes have been determined and van der Waals vibrational frequencies have been estimated. The equilibrium configurations are C2h, C3h, and C2v, for (CO2)2, (CO2)3, and the rare-gas-CO2 complexes, respectively. Most of the homogeneous linewidths for the revibrational transitions range from 0.5 to 22 MHz, indicating that predissociation is as much as four orders of magnitude faster than radiative processes for vibrational relaxation in these complexes

    Analytical study of hydrogen turbopump cycles for advanced nuclear rockets Progress report, Sep. 15, 1964 - Sep. 15, 1965

    Get PDF
    Hydrogen turbopump cycles for obtaining high engine inlet pressures in advanced nuclear rockets, and data on gaseous nuclear reactors and heavy gas containmen

    Parametric Fokker-Planck equation

    Full text link
    We derive the Fokker-Planck equation on the parametric space. It is the Wasserstein gradient flow of relative entropy on the statistical manifold. We pull back the PDE to a finite dimensional ODE on parameter space. Some analytical example and numerical examples are presented

    "How May I Help You?": Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts

    Full text link
    Given the increasing popularity of customer service dialogue on Twitter, analysis of conversation data is essential to understand trends in customer and agent behavior for the purpose of automating customer service interactions. In this work, we develop a novel taxonomy of fine-grained "dialogue acts" frequently observed in customer service, showcasing acts that are more suited to the domain than the more generic existing taxonomies. Using a sequential SVM-HMM model, we model conversation flow, predicting the dialogue act of a given turn in real-time. We characterize differences between customer and agent behavior in Twitter customer service conversations, and investigate the effect of testing our system on different customer service industries. Finally, we use a data-driven approach to predict important conversation outcomes: customer satisfaction, customer frustration, and overall problem resolution. We show that the type and location of certain dialogue acts in a conversation have a significant effect on the probability of desirable and undesirable outcomes, and present actionable rules based on our findings. The patterns and rules we derive can be used as guidelines for outcome-driven automated customer service platforms.Comment: 13 pages, 6 figures, IUI 201

    Indication for π+π\pi^+ \pi^- scattering in p+pp+p collisions at sNN=\sqrt{s_{_{NN}}} = 200 GeV

    Full text link
    A ρ(770)0\rho(770)^0 mass shift of about -40 MeV/c2c^2 was measured in p+pp+p collisions at sNN=\sqrt{s_{_{NN}}} = 200 GeV at RHIC. Previous mass shifts have been observed at CERN-LEBC-EHS and CERN-LEP. We will show that phase space does not account for the ρ(770)0\rho(770)^0 mass shift measured at RHIC, CERN-LEBC-EHS and CERN-LEP and conclude that there are significant scattering interactions in p+pp+p collisions.Comment: 11 pages and 7 figure

    Probabilistic models of information retrieval based on measuring the divergence from randomness

    Get PDF
    We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model
    corecore