115 research outputs found
Low- and high-resource opinion summarization
Customer reviews play a vital role in the online purchasing decisions we make. The reviews
express user opinions that are useful for setting realistic expectations and uncovering important
details about products. However, some products receive hundreds or even thousands of
reviews, making them time-consuming to read. Moreover, many reviews contain uninformative
content, such as irrelevant personal experiences. Automatic summarization offers an
alternative – short text summaries capturing the essential information expressed in reviews.
Automatically produced summaries can reflect overall or particular opinions and be tailored to
user preferences. Besides being presented on major e-commerce platforms, home assistants
can also vocalize them. This approach can improve user satisfaction by assisting in making
faster and better decisions.
Modern summarization approaches are based on neural networks, often requiring thousands of
annotated samples for training. However, human-written summaries for products are expensive
to produce because annotators need to read many reviews. This has led to annotated data
scarcity where only a few datasets are available. Data scarcity is the central theme of our
works, and we propose a number of approaches to alleviate the problem. The thesis consists
of two parts where we discuss low- and high-resource data settings.
In the first part, we propose self-supervised learning methods applied to customer reviews
and few-shot methods for learning from small annotated datasets. Customer reviews without
summaries are available in large quantities, contain a breadth of in-domain specifics, and
provide a powerful training signal. We show that reviews can be used for learning summarizers
via a self-supervised objective. Further, we address two main challenges associated with
learning from small annotated datasets. First, large models rapidly overfit on small datasets
leading to poor generalization. Second, it is not possible to learn a wide range of in-domain
specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to
subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We
address the first challenge by explicitly modeling summary properties (e.g., content coverage
and sentiment alignment). Furthermore, we leverage small modules – adapters – that are
more robust to overfitting. As we show, despite their size, these modules can be used to
store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method
for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and
‘resolution.’ This task is harder to learn, and we present a few-shot method for training a
query-based summarizer on small annotated datasets.
In the second part, we focus on the high-resource setting and present a large dataset with
summaries collected from various online resources. The dataset has more than 33,000 humanwritten
summaries, where each is linked up to thousands of reviews. This, however, makes it
challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To
address this problem, we propose selecting small subsets of informative reviews. Only these
subsets are encoded by the deep encoder and subsequently summarized. We show that the
selector and summarizer can be trained end-to-end via amortized inference and policy gradient
methods
Learning Directed Graphical Models with Optimal Transport
Estimating the parameters of a probabilistic directed graphical model from
incomplete data remains a long-standing challenge. This is because, in the
presence of latent variables, both the likelihood function and posterior
distribution are intractable without further assumptions about structural
dependencies or model classes. While existing learning methods are
fundamentally based on likelihood maximization, here we offer a new view of the
parameter learning problem through the lens of optimal transport. This
perspective licenses a general framework that operates on any directed graphs
without making unrealistic assumptions on the posterior over the latent
variables or resorting to black-box variational approximations. We develop a
theoretical framework and support it with extensive empirical evidence
demonstrating the flexibility and versatility of our approach. Across
experiments, we show that not only can our method recover the ground-truth
parameters but it also performs comparably or better on downstream
applications, notably the non-trivial task of discrete representation learning
The Fifteenth Marcel Grossmann Meeting
The three volumes of the proceedings of MG15 give a broad view of all aspects of gravitational physics and astrophysics, from mathematical issues to recent observations and experiments. The scientific program of the meeting included 40 morning plenary talks over 6 days, 5 evening popular talks and nearly 100 parallel sessions on 71 topics spread over 4 afternoons. These proceedings are a representative sample of the very many oral and poster presentations made at the meeting.Part A contains plenary and review articles and the contributions from some parallel sessions, while Parts B and C consist of those from the remaining parallel sessions. The contents range from the mathematical foundations of classical and quantum gravitational theories including recent developments in string theory, to precision tests of general relativity including progress towards the detection of gravitational waves, and from supernova cosmology to relativistic astrophysics, including topics such as gamma ray bursts, black hole physics both in our galaxy and in active galactic nuclei in other galaxies, and neutron star, pulsar and white dwarf astrophysics. Parallel sessions touch on dark matter, neutrinos, X-ray sources, astrophysical black holes, neutron stars, white dwarfs, binary systems, radiative transfer, accretion disks, quasars, gamma ray bursts, supernovas, alternative gravitational theories, perturbations of collapsed objects, analog models, black hole thermodynamics, numerical relativity, gravitational lensing, large scale structure, observational cosmology, early universe models and cosmic microwave background anisotropies, inhomogeneous cosmology, inflation, global structure, singularities, chaos, Einstein-Maxwell systems, wormholes, exact solutions of Einstein's equations, gravitational waves, gravitational wave detectors and data analysis, precision gravitational measurements, quantum gravity and loop quantum gravity, quantum cosmology, strings and branes, self-gravitating systems, gamma ray astronomy, cosmic rays and the history of general relativity
Robust and efficient inference and learning algorithms for generative models
Generative modelling is a popular paradigm in machine learning due to its natural
ability to describe uncertainty in data and models and for its applications including data
compression (Ho et al., 2020), missing data imputation (Valera et al., 2018), synthetic
data generation (Lin et al., 2020), representation learning (Kingma and Welling, 2014),
robust classification (Li et al., 2019b), and more. For generative models, the task of
finding the distribution of unobserved variables conditioned on observed ones is referred
to as inference. Finding the optimal model that makes the model distribution close to the
data distribution according to some discrepancy measures is called learning. In practice,
existing learning and inference methods can fall short on robustness and efficiency. A
method that is more robust to its hyper-parameters or different types of data can be
more easily adapted to various real-world applications. How efficient a method is in
regard to the size and the dimensionality of data determines at what scale the method
can be applied. This thesis presents four pieces of my original work that improves these
properties in generative models.
First, I introduce two novel Bayesian inference algorithms. One is called coupled
multinomial Hamiltonian Monte Carlo (Xu et al., 2021a); it builds on Heng and Jacob
(2019), which is a recent work in unbiased Markov chain Monte Carlo (MCMC) (Jacob
et al., 2019b) and has been found to sensitive to hyper-parameters and less efficient
compared to normal, biased MCMC. These issues are solved by establishing couplings
to the widely-used multinomial Hamiltonian Monte Carlo, leading to a statistically
more efficient and robust method. The other method is called roulette-based variational
expectation (RAVE; Xu et al., 2019) that applies amortised inference to a model family
called Bayesian non-parametric models, in which the number of parameters are allowed
to grow unbounded as the data gets more complex. Unlike previous sampling-based
methods that are slow or variational inference methods that rely on truncation, RAVE
combines the advantages of both to achieve flexible inference that is also computational
efficient. Second, I introduce two novel learning methods. One is called generative
ratio-matching (Srivastava et al., 2019) which is a learning algorithm that makes deep
generative models based on kernel methods applicable to high-dimensional data. The
key innovation of this method is learning a projection of the data to a lower-dimensional
space in which the density ratio is preserved such that learning can be done in the lowerdimensional
space where kernel methods are effective. The other method is called
Bayesian symbolic physics that combines Bayesian inference and symbolic regression
in the context of naïve physics—the study of how humans understand and learn physics.
Unlike classic generative models for which the structure of the generative process is
predefined or deep generative models where the process is represented by data-hungry
neural networks, Bayesian-symbolic generative processes are defined by functions over
a hypothesis space specified by a context-free grammar. This formulation allows these
models to incorporate domain knowledge in learning, which gives highly-improved
sample efficiency. For all four pieces of work, I provide theoretical analyses and/or
empirical results to validate that the algorithmic advances lead to improvements in
robustness and efficiency for generative models.
Lastly, I summarise my contributions to free and open-source software on generative
modelling. This includes a set of Julia packages that I contributed and are currently
used by the Turing probabilistic programming language (Ge et al., 2018). These packages,
which are highly reusable components for building probabilistic programming
languages, together form a probabilistic programming ecosystem in Julia. An important
package that is primarily developed by me is called ADVANCEDHMC.JL (Xu et al.,
2020), which provides robust and efficient implementations of HMC methods and has
been adopted as the backend of Turing. Importantly, the design of this package allows
an intuitive abstraction to construct HMC samplers similarly to how they are mathematically
defined. The promise of these open-source packages is to make generative
modelling techniques more accessible to domain experts from various backgrounds and
to make relevant research more reproducible to help advance the field
Machine Learning Methods for Generating High Dimensional Discrete Datasets
The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X\u27 that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons
Statistical Machine Learning for Modeling and Control of Stochastic Structured Systems
Machine learning and its various applications have driven innovation in robotics, synthetic perception, and data analytics. The last decade especially has experienced an explosion in interest in the research and development of artificial intelligence with successful adoption and deployment in some domains. A significant force behind these advances has been an abundance of data and the evolution of simple computational models and tools with a capacity to scale up to massive learning automata. Monolithic neural networks with billions of parameters that rely on automatic differentiation are a prime example of the significant role efficient computation has had on supercharging the ability of well-established representations to extract intelligent patterns from unstructured data.
Nonetheless, despite the strides taken in the digital domains of vision and natural language processing, applications of optimal control and robotics significantly trail behind and have not been able to capitalize as much on the latest trends of machine learning. This discrepancy can be explained by the limited transferability of learning concepts that rely on full differentiability to the heavily structured physical and human interaction environments, not to mention the substantial cost of data generation on real physical systems. Therefore, these factors severely limit the application scope of loosely-structured over-parameterized data-crunching machines in the mechanical realm of robot learning and control.
This thesis investigates modeling paradigms of hierarchical and switching systems to tackle some of the previously highlighted issues. This research direction is motivated by insights into universal function approximation via local cooperating units and the promise of inherently regularized representations through explicit structural design. Moreover, we explore ideas from robust optimization that address model mismatch issues in statistical models and outline how related methods may be used to improve the tractability of state filtering in stochastic hybrid systems.
In Chapter 2, we consider hierarchical modeling for general regression problems. The presented approach is a generative probabilistic interpretation of local regression techniques that approximate nonlinear functions through a set of local linear or polynomial units. The number of available units is crucial in such models, as it directly balances representational power with the parametric complexity. This ambiguity is addressed by using principles from Bayesian nonparametrics to formulate flexible models that adapt their complexity to the data and can potentially encompass an infinite number of components. To learn these representations, we present two efficient variational inference techniques that scale well with data and highlight the advantages of hierarchical infinite local regression models, such as dealing with non-smooth functions, mitigating catastrophic forgetting, and enabling parameter sharing and fast predictions. Finally, we validate this approach on a set of large inverse dynamics datasets and test the learned models in real-world control scenarios.
Chapter 3 addresses discrete-continuous hybrid modeling and control for stochastic dynamical systems, which implies dealing with time-series data. In this scenario, we develop an automatic system identification technique that decomposes nonlinear systems into hybrid automata and leverages the resulting structure to learn switching feedback control via hierarchical reinforcement learning. In the process, we rely on an augmented closed-loop hidden Markov model architecture that captures time correlations over long horizons and provides a principled Bayesian inference framework for learning hybrid representations and filtering the hidden discrete states to apply control accordingly. Finally, we embed this structure explicitly into a novel hybrid relative entropy policy search algorithm that optimizes a set of local polynomial feedback controllers and value functions. We validate the overall switching-system perspective by benchmarking the open-loop predictive performance against popular black-box representations. We also provide qualitative empirical results for hybrid reinforcement learning on common nonlinear control tasks.
In Chapter 4, we attend to a general and fundamental problem in learning for control, namely robustness in data-driven stochastic optimization. The question of sensitivity has a strong priority, given the rising popularity of embedding statistical models into stochastic control frameworks. However, data from dynamical, especially mechanical, systems is often scarce due to a high extraction cost and limited coverage of the state-action space. The result is usually poor models with narrow validity and brittle control laws, particularly in an ill-posed over-parameterized learning example. We propose to robustify stochastic control by finding the worst-case distribution over the dynamics and optimizing a corresponding robust policy that minimizes the probability of catastrophic failures. We achieve this goal by formulating a two-stage iterative minimax optimization problem that finds the most pessimistic adversary in a trust region around a nominal model and uses it to optimize a robust optimal controller. We test this approach on a set of linear and nonlinear stochastic systems and supply empirical evidence of its practicality. Finally, we provide an outlook on how similar multi-stage distributional optimization techniques can be applied in approximate filtering of stochastic switching systems in order to tackle the issue of exponential explosion in state mixture components.
In summation, the individual contributions of this thesis are a collection of interconnected principles for structured and robust learning for control. Although many challenges remain ahead, this research lays a foundation for reflecting on future structured learning questions that strive to combine optimal control and statistical machine learning perspectives for the automatic decomposition and optimization of hierarchical models
Contribuciones al análisis multivariante de datos ponderados geográficamente
[ES] Dentro de la estadística espacial hay una subárea particular denominada modelos ponderados geográficamente. Estos modelos se utilizan en situaciones donde la dependencia y la heterogeneidad espacial se convierte en el mayor foco de investigación. El paradigma de los modelos ponderados geográficamente es amplio y ha incluido una variedad de modelos entre los cuales tenemos la Regresión Ponderada Geográficamente, el Análisis de Componentes Principales Ponderados Geográficamente, el Análisis Discriminante Ponderado Geográficamente y el Análisis de Cluster Ponderado Geográficamente. En este trabajo se, ha realizado una exhaustiva revisión bibliográfica tanto de las técnicas estadísticas que se pueden utilizar para analizar datos ponderados geográficamente como de sus aplicaciones a las diferentes áreas científicas. También se ha revisado el software existente en la actualidad para llevar a cabo la aplicación de estos métodos y se ha desarrollado una herramienta en un entorno informático que permite la utilización de esas técnicas de una manera fácil, amigable y flexible. Para la revisión bibliográfica se propuso una novedosa metodología que fue implementada en una aplicación de código abierto llamada LDAShiny, que utiliza herramientas de aprendizaje automático y modelado de una manera interactiva y fácil de usar. Las matrices resultantes de la modelización de tópicos fueron analizadas mediante técnicas de Análisis Multivariante, en concreto mediante Escalamiento Multidimensional no métrico y HJ-Biplot. Tras la revisión de los programas informáticos que implementan los modelos ponderados geográficamente, se propuso una nueva herramienta de análisis, denominada GeoWeightedModel. Esta se presenta como una interfaz simple e intuitiva en donde los análisis se pueden realizar de forma interactiva (“apuntar y hacer clic”) en un navegador web. La aplicación GeoWeightedModel se utilizó para el análisis de datos reales que recogen información para explorar y visualizar la heterogeneidad espacial de las relaciones entre varias variables (a saber, datos sobre la mortalidad por cáncer de pulmón y bronquios y factores de riesgo a nivel de condados de EE.UU, y datos sobre resultados electorales en EE.UU). A partir de los resultados obtenidos, concluimos que la Regresión Ponderada Geográficamente fue la técnica con el mayor número de extensiones y publicaciones (a saber, 3183). Además, el uso de la metodología de revisión propuesta a través del programa LDAShiny, permitió identificar con éxito 22 tópicos de investigación que definen el estado actual de la investigación en el área de los modelos ponderados geográficamente. Los resultados del escalamiento multidimensional no métrico permitieron validar el etiquetado de los tópicos, al mostrar agrupaciones coherentes y superposición de nodos, lo que indica distribuciones de palabras similares. El HJ-Biplot permitió analizar y visualizar de manera sencilla la distribución por países de los tópicos encontrados. El análisis de datos reales mediante el programa propuesto GeoWeightedModel, puso de manifiesto que es una interesante herramienta para el análisis de datos ponderados geográficamente que no exige que los investigadores aplicados, como usuarios, tengan grandes conocimientos de programación y/o manejo de software. Con la interfaz gráfica desarrollada para el programa GeoWeightedModel se pudo demostrar que todas las acciones necesarias para el proceso de análisis de datos pueden ser accesibles para cualquier usuario, así como extensible a cualquier área de interés
- …