115 research outputs found

    Low- and high-resource opinion summarization

    Get PDF
    Customer reviews play a vital role in the online purchasing decisions we make. The reviews express user opinions that are useful for setting realistic expectations and uncovering important details about products. However, some products receive hundreds or even thousands of reviews, making them time-consuming to read. Moreover, many reviews contain uninformative content, such as irrelevant personal experiences. Automatic summarization offers an alternative – short text summaries capturing the essential information expressed in reviews. Automatically produced summaries can reflect overall or particular opinions and be tailored to user preferences. Besides being presented on major e-commerce platforms, home assistants can also vocalize them. This approach can improve user satisfaction by assisting in making faster and better decisions. Modern summarization approaches are based on neural networks, often requiring thousands of annotated samples for training. However, human-written summaries for products are expensive to produce because annotators need to read many reviews. This has led to annotated data scarcity where only a few datasets are available. Data scarcity is the central theme of our works, and we propose a number of approaches to alleviate the problem. The thesis consists of two parts where we discuss low- and high-resource data settings. In the first part, we propose self-supervised learning methods applied to customer reviews and few-shot methods for learning from small annotated datasets. Customer reviews without summaries are available in large quantities, contain a breadth of in-domain specifics, and provide a powerful training signal. We show that reviews can be used for learning summarizers via a self-supervised objective. Further, we address two main challenges associated with learning from small annotated datasets. First, large models rapidly overfit on small datasets leading to poor generalization. Second, it is not possible to learn a wide range of in-domain specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We address the first challenge by explicitly modeling summary properties (e.g., content coverage and sentiment alignment). Furthermore, we leverage small modules – adapters – that are more robust to overfitting. As we show, despite their size, these modules can be used to store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and ‘resolution.’ This task is harder to learn, and we present a few-shot method for training a query-based summarizer on small annotated datasets. In the second part, we focus on the high-resource setting and present a large dataset with summaries collected from various online resources. The dataset has more than 33,000 humanwritten summaries, where each is linked up to thousands of reviews. This, however, makes it challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To address this problem, we propose selecting small subsets of informative reviews. Only these subsets are encoded by the deep encoder and subsequently summarized. We show that the selector and summarizer can be trained end-to-end via amortized inference and policy gradient methods

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Learning Directed Graphical Models with Optimal Transport

    Full text link
    Estimating the parameters of a probabilistic directed graphical model from incomplete data remains a long-standing challenge. This is because, in the presence of latent variables, both the likelihood function and posterior distribution are intractable without further assumptions about structural dependencies or model classes. While existing learning methods are fundamentally based on likelihood maximization, here we offer a new view of the parameter learning problem through the lens of optimal transport. This perspective licenses a general framework that operates on any directed graphs without making unrealistic assumptions on the posterior over the latent variables or resorting to black-box variational approximations. We develop a theoretical framework and support it with extensive empirical evidence demonstrating the flexibility and versatility of our approach. Across experiments, we show that not only can our method recover the ground-truth parameters but it also performs comparably or better on downstream applications, notably the non-trivial task of discrete representation learning

    The Fifteenth Marcel Grossmann Meeting

    Get PDF
    The three volumes of the proceedings of MG15 give a broad view of all aspects of gravitational physics and astrophysics, from mathematical issues to recent observations and experiments. The scientific program of the meeting included 40 morning plenary talks over 6 days, 5 evening popular talks and nearly 100 parallel sessions on 71 topics spread over 4 afternoons. These proceedings are a representative sample of the very many oral and poster presentations made at the meeting.Part A contains plenary and review articles and the contributions from some parallel sessions, while Parts B and C consist of those from the remaining parallel sessions. The contents range from the mathematical foundations of classical and quantum gravitational theories including recent developments in string theory, to precision tests of general relativity including progress towards the detection of gravitational waves, and from supernova cosmology to relativistic astrophysics, including topics such as gamma ray bursts, black hole physics both in our galaxy and in active galactic nuclei in other galaxies, and neutron star, pulsar and white dwarf astrophysics. Parallel sessions touch on dark matter, neutrinos, X-ray sources, astrophysical black holes, neutron stars, white dwarfs, binary systems, radiative transfer, accretion disks, quasars, gamma ray bursts, supernovas, alternative gravitational theories, perturbations of collapsed objects, analog models, black hole thermodynamics, numerical relativity, gravitational lensing, large scale structure, observational cosmology, early universe models and cosmic microwave background anisotropies, inhomogeneous cosmology, inflation, global structure, singularities, chaos, Einstein-Maxwell systems, wormholes, exact solutions of Einstein's equations, gravitational waves, gravitational wave detectors and data analysis, precision gravitational measurements, quantum gravity and loop quantum gravity, quantum cosmology, strings and branes, self-gravitating systems, gamma ray astronomy, cosmic rays and the history of general relativity

    Robust and efficient inference and learning algorithms for generative models

    Get PDF
    Generative modelling is a popular paradigm in machine learning due to its natural ability to describe uncertainty in data and models and for its applications including data compression (Ho et al., 2020), missing data imputation (Valera et al., 2018), synthetic data generation (Lin et al., 2020), representation learning (Kingma and Welling, 2014), robust classification (Li et al., 2019b), and more. For generative models, the task of finding the distribution of unobserved variables conditioned on observed ones is referred to as inference. Finding the optimal model that makes the model distribution close to the data distribution according to some discrepancy measures is called learning. In practice, existing learning and inference methods can fall short on robustness and efficiency. A method that is more robust to its hyper-parameters or different types of data can be more easily adapted to various real-world applications. How efficient a method is in regard to the size and the dimensionality of data determines at what scale the method can be applied. This thesis presents four pieces of my original work that improves these properties in generative models. First, I introduce two novel Bayesian inference algorithms. One is called coupled multinomial Hamiltonian Monte Carlo (Xu et al., 2021a); it builds on Heng and Jacob (2019), which is a recent work in unbiased Markov chain Monte Carlo (MCMC) (Jacob et al., 2019b) and has been found to sensitive to hyper-parameters and less efficient compared to normal, biased MCMC. These issues are solved by establishing couplings to the widely-used multinomial Hamiltonian Monte Carlo, leading to a statistically more efficient and robust method. The other method is called roulette-based variational expectation (RAVE; Xu et al., 2019) that applies amortised inference to a model family called Bayesian non-parametric models, in which the number of parameters are allowed to grow unbounded as the data gets more complex. Unlike previous sampling-based methods that are slow or variational inference methods that rely on truncation, RAVE combines the advantages of both to achieve flexible inference that is also computational efficient. Second, I introduce two novel learning methods. One is called generative ratio-matching (Srivastava et al., 2019) which is a learning algorithm that makes deep generative models based on kernel methods applicable to high-dimensional data. The key innovation of this method is learning a projection of the data to a lower-dimensional space in which the density ratio is preserved such that learning can be done in the lowerdimensional space where kernel methods are effective. The other method is called Bayesian symbolic physics that combines Bayesian inference and symbolic regression in the context of naïve physics—the study of how humans understand and learn physics. Unlike classic generative models for which the structure of the generative process is predefined or deep generative models where the process is represented by data-hungry neural networks, Bayesian-symbolic generative processes are defined by functions over a hypothesis space specified by a context-free grammar. This formulation allows these models to incorporate domain knowledge in learning, which gives highly-improved sample efficiency. For all four pieces of work, I provide theoretical analyses and/or empirical results to validate that the algorithmic advances lead to improvements in robustness and efficiency for generative models. Lastly, I summarise my contributions to free and open-source software on generative modelling. This includes a set of Julia packages that I contributed and are currently used by the Turing probabilistic programming language (Ge et al., 2018). These packages, which are highly reusable components for building probabilistic programming languages, together form a probabilistic programming ecosystem in Julia. An important package that is primarily developed by me is called ADVANCEDHMC.JL (Xu et al., 2020), which provides robust and efficient implementations of HMC methods and has been adopted as the backend of Turing. Importantly, the design of this package allows an intuitive abstraction to construct HMC samplers similarly to how they are mathematically defined. The promise of these open-source packages is to make generative modelling techniques more accessible to domain experts from various backgrounds and to make relevant research more reproducible to help advance the field

    Machine Learning Methods for Generating High Dimensional Discrete Datasets

    Get PDF
    The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X\u27 that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons

    Statistical Machine Learning for Modeling and Control of Stochastic Structured Systems

    Get PDF
    Machine learning and its various applications have driven innovation in robotics, synthetic perception, and data analytics. The last decade especially has experienced an explosion in interest in the research and development of artificial intelligence with successful adoption and deployment in some domains. A significant force behind these advances has been an abundance of data and the evolution of simple computational models and tools with a capacity to scale up to massive learning automata. Monolithic neural networks with billions of parameters that rely on automatic differentiation are a prime example of the significant role efficient computation has had on supercharging the ability of well-established representations to extract intelligent patterns from unstructured data. Nonetheless, despite the strides taken in the digital domains of vision and natural language processing, applications of optimal control and robotics significantly trail behind and have not been able to capitalize as much on the latest trends of machine learning. This discrepancy can be explained by the limited transferability of learning concepts that rely on full differentiability to the heavily structured physical and human interaction environments, not to mention the substantial cost of data generation on real physical systems. Therefore, these factors severely limit the application scope of loosely-structured over-parameterized data-crunching machines in the mechanical realm of robot learning and control. This thesis investigates modeling paradigms of hierarchical and switching systems to tackle some of the previously highlighted issues. This research direction is motivated by insights into universal function approximation via local cooperating units and the promise of inherently regularized representations through explicit structural design. Moreover, we explore ideas from robust optimization that address model mismatch issues in statistical models and outline how related methods may be used to improve the tractability of state filtering in stochastic hybrid systems. In Chapter 2, we consider hierarchical modeling for general regression problems. The presented approach is a generative probabilistic interpretation of local regression techniques that approximate nonlinear functions through a set of local linear or polynomial units. The number of available units is crucial in such models, as it directly balances representational power with the parametric complexity. This ambiguity is addressed by using principles from Bayesian nonparametrics to formulate flexible models that adapt their complexity to the data and can potentially encompass an infinite number of components. To learn these representations, we present two efficient variational inference techniques that scale well with data and highlight the advantages of hierarchical infinite local regression models, such as dealing with non-smooth functions, mitigating catastrophic forgetting, and enabling parameter sharing and fast predictions. Finally, we validate this approach on a set of large inverse dynamics datasets and test the learned models in real-world control scenarios. Chapter 3 addresses discrete-continuous hybrid modeling and control for stochastic dynamical systems, which implies dealing with time-series data. In this scenario, we develop an automatic system identification technique that decomposes nonlinear systems into hybrid automata and leverages the resulting structure to learn switching feedback control via hierarchical reinforcement learning. In the process, we rely on an augmented closed-loop hidden Markov model architecture that captures time correlations over long horizons and provides a principled Bayesian inference framework for learning hybrid representations and filtering the hidden discrete states to apply control accordingly. Finally, we embed this structure explicitly into a novel hybrid relative entropy policy search algorithm that optimizes a set of local polynomial feedback controllers and value functions. We validate the overall switching-system perspective by benchmarking the open-loop predictive performance against popular black-box representations. We also provide qualitative empirical results for hybrid reinforcement learning on common nonlinear control tasks. In Chapter 4, we attend to a general and fundamental problem in learning for control, namely robustness in data-driven stochastic optimization. The question of sensitivity has a strong priority, given the rising popularity of embedding statistical models into stochastic control frameworks. However, data from dynamical, especially mechanical, systems is often scarce due to a high extraction cost and limited coverage of the state-action space. The result is usually poor models with narrow validity and brittle control laws, particularly in an ill-posed over-parameterized learning example. We propose to robustify stochastic control by finding the worst-case distribution over the dynamics and optimizing a corresponding robust policy that minimizes the probability of catastrophic failures. We achieve this goal by formulating a two-stage iterative minimax optimization problem that finds the most pessimistic adversary in a trust region around a nominal model and uses it to optimize a robust optimal controller. We test this approach on a set of linear and nonlinear stochastic systems and supply empirical evidence of its practicality. Finally, we provide an outlook on how similar multi-stage distributional optimization techniques can be applied in approximate filtering of stochastic switching systems in order to tackle the issue of exponential explosion in state mixture components. In summation, the individual contributions of this thesis are a collection of interconnected principles for structured and robust learning for control. Although many challenges remain ahead, this research lays a foundation for reflecting on future structured learning questions that strive to combine optimal control and statistical machine learning perspectives for the automatic decomposition and optimization of hierarchical models

    Contribuciones al análisis multivariante de datos ponderados geográficamente

    Get PDF
    [ES] Dentro de la estadística espacial hay una subárea particular denominada modelos ponderados geográficamente. Estos modelos se utilizan en situaciones donde la dependencia y la heterogeneidad espacial se convierte en el mayor foco de investigación. El paradigma de los modelos ponderados geográficamente es amplio y ha incluido una variedad de modelos entre los cuales tenemos la Regresión Ponderada Geográficamente, el Análisis de Componentes Principales Ponderados Geográficamente, el Análisis Discriminante Ponderado Geográficamente y el Análisis de Cluster Ponderado Geográficamente. En este trabajo se, ha realizado una exhaustiva revisión bibliográfica tanto de las técnicas estadísticas que se pueden utilizar para analizar datos ponderados geográficamente como de sus aplicaciones a las diferentes áreas científicas. También se ha revisado el software existente en la actualidad para llevar a cabo la aplicación de estos métodos y se ha desarrollado una herramienta en un entorno informático que permite la utilización de esas técnicas de una manera fácil, amigable y flexible. Para la revisión bibliográfica se propuso una novedosa metodología que fue implementada en una aplicación de código abierto llamada LDAShiny, que utiliza herramientas de aprendizaje automático y modelado de una manera interactiva y fácil de usar. Las matrices resultantes de la modelización de tópicos fueron analizadas mediante técnicas de Análisis Multivariante, en concreto mediante Escalamiento Multidimensional no métrico y HJ-Biplot. Tras la revisión de los programas informáticos que implementan los modelos ponderados geográficamente, se propuso una nueva herramienta de análisis, denominada GeoWeightedModel. Esta se presenta como una interfaz simple e intuitiva en donde los análisis se pueden realizar de forma interactiva (“apuntar y hacer clic”) en un navegador web. La aplicación GeoWeightedModel se utilizó para el análisis de datos reales que recogen información para explorar y visualizar la heterogeneidad espacial de las relaciones entre varias variables (a saber, datos sobre la mortalidad por cáncer de pulmón y bronquios y factores de riesgo a nivel de condados de EE.UU, y datos sobre resultados electorales en EE.UU). A partir de los resultados obtenidos, concluimos que la Regresión Ponderada Geográficamente fue la técnica con el mayor número de extensiones y publicaciones (a saber, 3183). Además, el uso de la metodología de revisión propuesta a través del programa LDAShiny, permitió identificar con éxito 22 tópicos de investigación que definen el estado actual de la investigación en el área de los modelos ponderados geográficamente. Los resultados del escalamiento multidimensional no métrico permitieron validar el etiquetado de los tópicos, al mostrar agrupaciones coherentes y superposición de nodos, lo que indica distribuciones de palabras similares. El HJ-Biplot permitió analizar y visualizar de manera sencilla la distribución por países de los tópicos encontrados. El análisis de datos reales mediante el programa propuesto GeoWeightedModel, puso de manifiesto que es una interesante herramienta para el análisis de datos ponderados geográficamente que no exige que los investigadores aplicados, como usuarios, tengan grandes conocimientos de programación y/o manejo de software. Con la interfaz gráfica desarrollada para el programa GeoWeightedModel se pudo demostrar que todas las acciones necesarias para el proceso de análisis de datos pueden ser accesibles para cualquier usuario, así como extensible a cualquier área de interés
    corecore