25,386 research outputs found
Data-to-text generation with neural planning
In this thesis, we consider the task of data-to-text generation, which takes non-linguistic
structures as input and produces textual output. The inputs can take the form of
database tables, spreadsheets, charts, and so on. The main application of data-to-text
generation is to present information in a textual format which makes it accessible to
a layperson who may otherwise find it problematic to understand numerical figures.
The task can also automate routine document generation jobs, thus improving human
efficiency. We focus on generating long-form text, i.e., documents with multiple paragraphs. Recent approaches to data-to-text generation have adopted the very successful
encoder-decoder architecture or its variants. These models generate fluent (but often
imprecise) text and perform quite poorly at selecting appropriate content and ordering
it coherently. This thesis focuses on overcoming these issues by integrating content
planning with neural models. We hypothesize data-to-text generation will benefit from
explicit planning, which manifests itself in (a) micro planning, (b) latent entity planning, and (c) macro planning. Throughout this thesis, we assume the input to our
generator are tables (with records) in the sports domain. And the output are summaries
describing what happened in the game (e.g., who won/lost, ..., scored, etc.).
We first describe our work on integrating fine-grained or micro plans with data-to-text generation. As part of this, we generate a micro plan highlighting which records
should be mentioned and in which order, and then generate the document while taking
the micro plan into account.
We then show how data-to-text generation can benefit from higher level latent entity planning. Here, we make use of entity-specific representations which are dynam ically updated. The text is generated conditioned on entity representations and the
records corresponding to the entities by using hierarchical attention at each time step.
We then combine planning with the high level organization of entities, events, and
their interactions. Such coarse-grained macro plans are learnt from data and given
as input to the generator. Finally, we present work on making macro plans latent
while incrementally generating a document paragraph by paragraph. We infer latent
plans sequentially with a structured variational model while interleaving the steps of
planning and generation. Text is generated by conditioning on previous variational
decisions and previously generated text.
Overall our results show that planning makes data-to-text generation more interpretable, improves the factuality and coherence of the generated documents and re duces redundancy in the output document
Recommended from our members
Reliable Decision-Making with Imprecise Models
The rapid growth in the deployment of autonomous systems across various sectors has generated considerable interest in how these systems can operate reliably in large, stochastic, and unstructured environments. Despite recent advances in artificial intelligence and machine learning, it is challenging to assure that autonomous systems will operate reliably in the open world. One of the causes of unreliable behavior is the impreciseness of the model used for decision-making. Due to the practical challenges in data collection and precise model specification, autonomous systems often operate based on models that do not represent all the details in the environment. Even if the system has access to a comprehensive decision-making model that accounts for all the details in the environment and all possible scenarios the agent may encounter, it may be intractable to solve this complex model optimally. Consequently, this complex, high fidelity model may be simplified to accelerate planning, introducing imprecision. Reasoning with such imprecise models affects the reliability of autonomous systems. A system\u27s actions may sometimes produce unexpected, undesirable consequences, which are often identified after deployment. How can we design autonomous systems that can operate reliably in the presence of uncertainty and model imprecision?
This dissertation presents solutions to address three classes of model imprecision in a Markov decision process, along with an analysis of the conditions under which bounded-performance can be guaranteed. First, an adaptive outcome selection approach is introduced to devise risk-aware reduced models of the environment that efficiently balance the trade-off between model simplicity and fidelity, to accelerate planning in resource-constrained settings. Second, a framework that extends stochastic shortest path framework to problems with imperfect information about the goal state during planning is introduced, along with two solution approaches to solve this problem. Finally, two complementary solution approaches are presented to minimize the negative side effects of agent actions. The techniques presented in this dissertation enable an autonomous system to detect and mitigate undesirable behavior, without redesigning the model entirely
A GNN-based multi-task learning framework for personalized video search
Watching online videos has become more and more popular and users tend to watch videos based on their personal tastes and preferences. Providing a customized ranking list to maximize the user's satisfaction has become increasingly important for online video platforms. Existing personalized search methods (PSMs) train their models with user feedback information (e.g. clicks). However, we identified that such feedback signals may indicate attractiveness but not necessarily indicate relevance in video search. Besides, the click data and user historical information are usually too sparse to train a good PSM, which is different from the conventional Web search containing users' rich historical information. To address these concerns, in this paper we propose a multi-task graph neural network architecture for personalized video search (MGNN-PVS) that can jointly model user's click behaviour and the relevance between queries and videos. To relieve the sparsity problem and learn better representation for users, queries and videos, we develop an efficient and novel GNN architecture based on neighborhood sampling and hierarchical aggregation strategy by leveraging their different hops of neighbors in the user-query and query-document click graph. Extensive experiments on a major commercial video search engine show that our model significantly outperforms state-of-the-art PSMs, which illustrates the effectiveness of our proposed framework
Recommended from our members
Computational development of a phase-sensitive membrane raft probe
Derivatives of the widely used 1,6-diphenyl-1,3,5-hexatriene have been considered using spin-flip time-dependent density functional theory, classical molecular dynamics and hybrid quantum mechanics / molecular mechanics. We identify a potential probe of membrane phase (i.e. to preferentially detect liquid-ordered regions of lipid bilayers), which exhibits restricted access to a conical intersection in the liquid-ordered phase but is freely accessible in less ordered molecular environments. The characteristics of this probe also mark it as a candidate for an aggregation induced emission fluorophore
Reforming the United Nations
The thesis deals with the financial crisis that the United Nations faced starting in 1985 when the US Congress decided to withhold a significant part of the US contribution to the UN regular budget in order to force a greater say for the major contributors on budgetary issues, budgetary restraint and greater efficiency. The UN responded by the adoption of resolution 41/213 of 19 December 1986 that was based on the recommendations of a Group of High-level Intergovernmental Experts ("G-18") set up a year earlier. A new system was introduced regarding the formulation of the regular budget of the United Nations Organisation and a broader process of reform was initiated including a restructuring of the Secretariat and of the intergovernmental machinery in the economic and social fields. After an introductory chapter (Chapter I), the thesis examines the UN problems at the budgetary/financial and administrative/structural levels, the solutions proposed from within and without the United Nations established framework and the actual attempts at reform (Chapters II and ifi). The realisation that the implementation of reforms is rather disjointed and often unsuccessful (e.g. the failure to restructure the intergovernmental machi.neiy) prompts a search for the deeper causes of the UN problems at the political level and the attitudes of the main actors, namely the USA, the USSR, some up-and-coming states, notably Japan, the Third World states and, finally, of the UN Secretary-General and the Secretariat (Chapter 1V). Although the financial crisis may have subsided since 1988 and the USA seem committed to paying up their dues, the deeper UN crisis of identity has not been resolved and is expected to resurface if no bold steps are taken. In that direction, some possible alternative courses for the UN in the future are discussed drawing upon theory and practice (Chapte
Recommended from our members
Privacy-aware Smart Home Interface Framework
Smart home user interfaces are pervasive and shared by multiple users who occupy the space. Therefore, they pose a risk to interpersonal privacy of occupants because an individual’s sensitive information can be leaked to other co-occupants (information privacy), or they can be disturbed by intrusions into their personal space (physical privacy) when the co-occupant interacts with the smart home user interfaces. This thesis hypothesises that interpersonal privacy violations can be mitigated by adapting the user interface layer and presents insights into how to achieve usable user interface adaptation to mitigate or minimise interpersonal privacy violations in smart homes.
The thesis reports two case studies and two user studies. The first case study identifies the key characteristics needed to model the rich context of interpersonal privacy violations scenarios. Then it presents knowledge representation models that are required to represent the identified characteristics and evaluates them for adequacy in modelling the context information of interpersonal privacy violation scenarios. The second case study presents a software architecture and a set of algorithms that can detect interpersonal privacy violations and generate usable user interface adaptations. Then it evaluates the architecture and the algorithms for adequacy in generating usable privacy-aware user interface adaptations. The first user study (N=15) evaluates the usability of the adaptive user interfaces generated from the framework where storyboards were used as the stimulant. Extending the findings from the usability study and expanding the coverage of example scenarios, the second user study (N=23) evaluates the overall user experience of the adaptive user interfaces, using video prototypes as the stimulant.
The research demonstrates that the characteristics identified, and the respective knowledge representation models adequately captured the context of interpersonal privacy violation scenarios. Furthermore, the software architecture and the algorithms could detect possible interpersonal privacy violations and generate usable user interface adaptations to mitigate them. The two user studies demonstrate that the adaptive user interfaces, when used in appropriate situations, were a suitable solution for addressing interpersonal privacy violations while providing high usability and a positive user experience. The thesis concludes by providing recommendations for developing privacy-aware user interface adaptations and suggesting future work that can extend this research
Identification of new regenerative therapies in reproductive medicine and their application as a future therapeutic approach for endometrial regeneration
El útero es uno de los principales órganos internos del sistema reproductor femenino. Está compuesto de tres capas tisulares: perimetrio, miometrio y endometrio. Esta última capa recubre la cavidad intrauterina y es responsable directa de la implantación embrionaria (para la cual necesita un grosor endometrial mínimo).
Entre las patologías que afectan al endometrio pueden distinguirse, entre otras, la atrofia endometrial (insuficiente grosor endometrial) y el síndrome de Asherman (presencia de adhesiones intrauterinas y tejido fibrótico), las cuales conforman el hilo conductor de esta tesis, compuesta de 4 artículos científicos. En ambos casos, el tejido endometrial se encuentra degenerado, lo que dificulta la implantación embrionaria, ocasionando problemas de fertilidad.
A día de hoy, ninguna de estas patologías cuenta con una cura totalmente efectiva. Hasta el momento, una de las opciones terapéuticas más prometedora es la inyección de células madre. Por ello, el primer objetivo de esta tesis fue evaluar como la inyección de células madre derivadas de la médula ósea (aisladas con la detección del antígeno CD133), que había resultado ser efectiva tanto en un modelo humano como en uno animal, estaba modificando el endometrio molecularmente. Para así, intentar entender cuáles son los mecanismos paracrinos a través de los cuales llevan a cabo su acción terapéutica. Este primer estudio reveló que estas células madre parecían estar promoviendo la regeneración endometrial mediante la creación de un escenario inmunomodulador (sub-expresión del gen CXCL8), que daría paso a la sobreexpresión de genes involucrados en la regeneración tisular, como SERPINE1, IL4, y JUN.
Otro tratamiento que ha ido ganando acepción con los años es el plasma rico en plaquetas, eje central del manuscrito 2. Este manuscrito evidencia como este plasma, especialmente si proviene de sangre de cordón umbilical, es capaz de promover procesos celulares, como la migración y la proliferación de las células endometriales, así como eventos regenerativos en un modelo animal con daño endometrial inducido.
Sea cual sea la aproximación terapéutica de elección, se ha hipotetizado que esta regeneración tisular podría surgir de la estimulación del nicho de células madre presente en el endometrio. Es por ello que el objetivo 3 supuso el estudio de los trabajos publicados, tanto de modelos murinos como humanos, relativos a esta población de células madre endometriales. Esta búsqueda permitió concluir que aún quedan lagunas de conocimiento, bien sea en la definición de marcadores celulares específicos o en de la contribución de la médula ósea a este nicho de células madre endometriales.
Finalmente, dada la mencionada falta actual de una terapia definitiva para las pacientes con atrofia endometrial o síndrome de Asherman, el cuarto y último objetivo de esta tesis supuso el estudio de todas aquellas aproximaciones que se han llevado a cabo en modelos animales que simulan este tipo de patologías humanas. Este trabajo concluyó que si bien están emergiendo nuevas terapias muy prometedoras, como son aquellas derivadas de la bioingeniería (por ejemplo, uso de hidrogeles o biomoldes), todavía falta perfeccionar y estandarizar los modelos tanto animales como in vitro que permitan una mejor traslación clínica de las mismas.The uterus is one of the main internal organs of the female reproductive system. It is composed of three different tissue layers: perimetrium, myometrium, and endometrium. This last layer covers the intrauterine cavity and is directly responsible for embryo implantation (for which it needs a certain minimum endometrial thickness).
Among the pathologies affecting the endometrium, we can distinguish, among others, endometrial atrophy (characterized by an insufficient endometrial thickness) and Asherman's syndrome (a rare disease characterized by the presence of intrauterine adhesions and fibrotic tissue), which form the common thread of this thesis, composed of four original manuscripts. In both cases, the endometrial tissue is degenerated, which hinders the correct embryo implantation, causing then fertility problems.
To date, none of these pathologies has a totally effective cure. So far, one of the most promising therapeutic options is the injection of stem cells. Therefore, the first objective was to evaluate how the infusion of bone marrow-derived stem cells (isolated with the antigen CD133), which had proven effective in both a human and an animal model, was modifying the endometrium at the molecular level. Then, this work aimed to understand the paracrine mechanisms through which these cells were carrying out their therapeutic and regenerative action over the endometrial tissue. This first study revealed that these stem cells appeared to be promoting endometrial regeneration by creating an immunomodulatory scenario (down-regulation of the CXCL8 gene), which would give way to the over-expression of genes (SERPINE1, IL4, and JUN) involved in tissue regeneration.
Another treatment gaining acceptance over the years is a blood derivate, platelet-rich plasma, which was the focus of the second manuscript. This work shows how this plasma, mainly derived from umbilical cord blood rather than adult peripheral blood, can promote cellular processes, such as cell migration and proliferation of different types of endometrial cells (from primary culture and from stem cell lines). These plasmas also revealed how they triggered the over-expression of certain proteins involved in regenerative events in a mouse model with induced endometrial damage.
Whatever the therapeutic approach of choice, it has been hypothesized that regeneration could arise from stimulation of the stem cell niche present in the endometrium. That is why objective three involved studying those works, both murine and human models, concerning this population of endometrial stem cells. This search concluded that there are still gaps in knowledge, either in the definition of specific endometrial stem cell markers or in the contribution of the bone marrow to this endogenous endometrial stem cell niche.
Finally, given the aforementioned current lack of definitive therapy for patients with endometrial atrophy or Asherman's syndrome, the last objective involved studying all those approaches that have been carried out in animal models that simulate this type of human pathology. This work concluded that although new therapies are emerging, such as those derived from bioengineering (e.g. use of decellularized scaffolds or hydrogels), there is still a need to perfect and standardize both animal and in vitro models to allow a better clinical translation of these therapies
Machine learning and large scale cancer omic data: decoding the biological mechanisms underpinning cancer
Many of the mechanisms underpinning cancer risk and tumorigenesis are still not
fully understood. However, the next-generation sequencing revolution and the
rapid advances in big data analytics allow us to study cells
and complex phenotypes at unprecedented depth and breadth. While experimental
and clinical data are still fundamental to validate findings and confirm
hypotheses, computational biology is key for the analysis of system- and
population-level data for detection of hidden patterns and the generation of
testable hypotheses.
In this work, I tackle two main questions regarding cancer risk and tumorigenesis
that require novel computational methods for the analysis of system-level omic
data. First, I focused on how frequent, low-penetrance inherited variants modulate
cancer risk in the broader population. Genome-Wide Association Studies (GWAS)
have shown that Single Nucleotide Polymorphisms (SNP) contribute to cancer risk
with multiple subtle effects, but they are still failing to give further insight
into their synergistic effects. I developed a novel hierarchical Bayesian
regression model, BAGHERA, to estimate heritability at the gene-level from GWAS
summary statistics. I then used BAGHERA to analyse data from 38 malignancies in
the UK Biobank. I showed that genes with high heritable risk are involved in key
processes associated with cancer and are often localised in genes that are
somatically mutated drivers.
Heritability, like many other omics analysis methods, study the effects of DNA
variants on single genes in isolation. However, we know that most biological
processes require the interplay of multiple genes and we often lack a broad
perspective on them. For the second part of this thesis, I then worked on the
integration of Protein-Protein Interaction (PPI) graphs and omics data, which
bridges this gap and recapitulates these interactions at a system level. First,
I developed a modular and scalable Python package, PyGNA, that enables
robust statistical testing of genesets' topological properties. PyGNA complements
the literature with a tool that can be routinely introduced in bioinformatics
automated pipelines. With PyGNA I processed multiple genesets obtained from
genomics and transcriptomics data. However, topological properties alone have
proven to be insufficient to fully characterise complex phenotypes.
Therefore, I focused on a model that allows to combine topological and functional
data to detect multiple communities associated with a phenotype. Detecting
cancer-specific submodules is still an open problem, but it has the potential to
elucidate mechanisms detectable only by integrating multi-omics data. Building
on the recent advances in Graph Neural Networks (GNN), I present a supervised
geometric deep learning model that combines GNNs and Stochastic Block Models
(SBM). The model is able to learn multiple graph-aware representations, as
multiple joint SBMs, of the attributed network, accounting for nodes
participating in multiple processes. The simultaneous estimation of structure
and function provides an interpretable picture of how genes interact in specific
conditions and it allows to detect novel putative pathways associated with
cancer
- …