1,576 research outputs found
Network-based methods for biological data integration in precision medicine
[eng] The vast and continuously increasing volume of available biomedical data produced during the last decades opens new opportunities for large-scale modeling of disease biology, facilitating a more comprehensive and integrative understanding of its processes. Nevertheless, this type of modelling requires highly efficient computational systems capable of dealing with such levels of data volumes.
Computational approximations commonly used in machine learning and data analysis, namely dimensionality reduction and network-based approaches, have been developed with the goal of effectively integrating biomedical data. Among these methods, network-based machine learning stands out due to its major advantage in terms of biomedical interpretability. These methodologies provide a highly intuitive framework for the integration and modelling of biological processes.
This PhD thesis aims to explore the potential of integration of complementary available biomedical knowledge with patient-specific data to provide novel computational approaches to solve biomedical scenarios characterized by data scarcity. The primary focus is on studying how high-order graph analysis (i.e., community detection in multiplex and multilayer networks) may help elucidate the interplay of different types of data in contexts where statistical power is heavily impacted by small sample sizes, such as rare diseases and precision oncology.
The central focus of this thesis is to illustrate how network biology, among the several data integration approaches with the potential to achieve this task, can play a pivotal role in addressing this challenge provided its advantages in molecular interpretability. Through its insights and methodologies, it introduces how network biology, and in particular, models based on multilayer networks, facilitates bringing the vision of precision medicine to these complex scenarios, providing a natural approach for the discovery of new biomedical relationships that overcomes the difficulties for the study of cohorts presenting limited sample sizes (data-scarce scenarios).
Delving into the potential of current artificial intelligence (AI) and network biology applications to address data granularity issues in the precision medicine field, this PhD thesis presents pivotal research works, based on multilayer networks, for the analysis of two rare disease scenarios with specific data granularities, effectively overcoming the classical constraints hindering rare disease and precision oncology research.
The first research article presents a personalized medicine study of the molecular determinants of severity in congenital myasthenic syndromes (CMS), a group of rare disorders of the neuromuscular junction (NMJ). The analysis of severity in rare diseases, despite its importance, is typically neglected due to data availability. In this study, modelling of biomedical knowledge via multilayer networks allowed understanding the functional implications of individual mutations in the cohort under study, as well as their relationships with the causal mutations of the disease and the different levels of severity observed. Moreover, the study presents experimental evidence of the role of a previously unsuspected gene in NMJ activity, validating the hypothetical role predicted using the newly introduced methodologies.
The second research article focuses on the applicability of multilayer networks for gene priorization. Enhancing concepts for the analysis of different data granularities firstly introduced in the previous article, the presented research provides a methodology based on the persistency of network community structures in a range of modularity resolution, effectively providing a new framework for gene priorization for patient stratification.
In summary, this PhD thesis presents major advances on the use of multilayer network-based approaches for the application of precision medicine to data-scarce scenarios, exploring the potential of integrating extensive available biomedical knowledge with patient-specific data
Recommended from our members
Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes.
While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research
Unraveling the transcriptional Cis-regulatory code
It is nowadays accepted that eukaryotic complexity is not dictated by the number of protein-coding genes of the genome, but rather achieved through the combinatorics of gene expression programs. Distinct aspects of the expression pattern of a gene are mediated by discrete regulatory sequences, known as cis-regulatory elements. The work described in this thesis was aimed at developing computational and statistical methods to guide the search and characterization of novel cis-regulatory elements
Exploring missing heritability in neurodevelopmental disorders:Learning from regulatory elements
In this thesis, I aimed to solve part of the missing heritability in neurodevelopmental disorders, using computational approaches. Next to the investigations of a novel epilepsy syndrome and investigations aiming to elucidate the regulation of the gene involved, I investigated and prioritized genomic sequences that have implications in gene regulation during the developmental stages of human brain, with the goal to create an atlas of high confidence non-coding regulatory elements that future studies can assess for genetic variants in genetically unexplained individuals suffering from neurodevelopmental disorders that are of suspected genetic origin
Recommended from our members
A novel bioinformatic approach for comprehensive genome scale analysis identifies key regulators of macrophage activation.
The initiation of inflammatory cytokine transcription by bacterial ligands is a central mechanism by which the immune system activates its first line of defense. Macrophage activation by the Toll-like Receptor 4 (TLR4) pathway is initiated with receptor binding of lipopolysaccharides (LPS) and culminates in a large-scale transcriptional response of the inflammatory gene program. Advancements in genome-wide screening technologies have made it possible to interrogate the regulatory landscape of signaling pathways such as those activated by TLR4. Utilizing these high-throughput methods for the comprehensive characterization of pathway components, particularly for regulators that are involved in critical cellular processes such as transcription and translation, however, requires an approach that goes beyond the top scoring and previously characterized hits of genome-scale studies. To address this challenge, I developed the Throughput Ranking by Iterative Analysis of Genomic Enrichment (TRIAGE) method, a bioinformatic analysis model that facilitates the comprehensive identification of likely regulators by iterative sampling of pathway and network databases. I validated the TRIAGE approach by analyzing three previously published genome-wide studies of regulators of early HIV infection and viral transcription. Analysis by TRIAGE showed significantly increased overlap and identified shared novel targets across the three studies. I further developed the TRIAGE analysis method as a globally accessible web-based resource. Applying TRIAGE analysis to three genome-scale studies of LPS treatment in macrophages of mouse and human cell lines, I identified an enrichment for regulators relating to alternative splicing and protein degradation. Using short read and long read RNA-seq of ligand-stimulated macrophages I further characterized the broad transcriptional variation induced by the LPS response and the novel and known transcript variants that define different macrophage activation states. These findings define an approach for comprehensive unbiased discovery of signaling pathway regulators from genome-scale datasets and suggest a model of macrophage activation involving proteasomal removal of negative regulators and remodeling of the macrophage state via a transcriptional shift in splice variant dynamics
RESPONSE AND MOLECULAR CONTROL OF CD8 T CELLS DURING INFECTION AND CANCER
CD8 T cells are potent immune effector cells capable of vast clonal expansion and clearance of infected or cancerous cells. After control of the pathogenic insult, CD8 T cells develop into quiescent, long-lived memory populations that are poised to mediate rapid protection upon reencounter with cognate antigen. These properties make control of CD8 T cell responses a highly desirable outcome of vaccine strategies and immunotherapy. Therefore, understanding how the effector function and memory differentiation of CD8 T cells are controlled at a molecular level is of great importance. In the context of infection with gammaherpesviruses (γHV), which form a latent infection that persists for the life span of the host, CD8 T cells play a vital role in control of γHV associated lymphomagenesis. The following studies utilize murine gammaherpesvirus (MHV)-68 and a novel model of γHV-associated B cell lymphoma, EM61 to dissect the mechanisms of CD8 T cell mediated control of γHV associated lymphomagenesis. These studies indicate γHV-specific CD8 T cells control EM61 through mechanisms that partially overlap with those used to control viral replication, however, we note important differences as well. We additionally describe γHV-specific, tissue-resident, memory CD8 T cells (TRM) that form after infection with MHV-68. In the absence of CD4 T cell help, which causes reactivation of γHV during latency, the γHV-specific TRM compartment exhibits changes that are distinct from those observed in the context of acute viral infection. Additional work focused on the molecular control of CD8 T cells by the BTB-ZF family transcription factor (TF), Zbtb20, which restricts CD8 T cell memory differentiation. Using single cell techniques, we identify programs of transcriptional and epigenetic regulation associated with memory CD8 T cell differentiation that underly enhanced memory cell formation in the absence of Zbtb20. Furthermore, using a sensitive technique to interrogate Zbtb20-DNA binding, we describe DNA motifs and genomic annotations from the direct genomic targets of Zbtb20 in CD8 T cells. Together, this work provides new knowledge relevant to the response and control of CD8 T cells to infection and cancer
Studies on genetic and epigenetic regulation of gene expression dynamics
The information required to build an organism is contained in its genome and the first
biochemical process that activates the genetic information stored in DNA is transcription.
Cell type specific gene expression shapes cellular functional diversity and dysregulation
of transcription is a central tenet of human disease. Therefore, understanding
transcriptional regulation is central to understanding biology in health and disease.
Transcription is a dynamic process, occurring in discrete bursts of activity that can be
characterized by two kinetic parameters; burst frequency describing how often genes
burst and burst size describing how many transcripts are generated in each burst. Genes
are under strict regulatory control by distinct sequences in the genome as well as
epigenetic modifications. To properly study how genetic and epigenetic factors affect
transcription, it needs to be treated as the dynamic cellular process it is. In this thesis, I
present the development of methods that allow identification of newly induced gene
expression over short timescales, as well as inference of kinetic parameters describing
how frequently genes burst and how many transcripts each burst give rise to. The work is
presented through four papers:
In paper I, I describe the development of a novel method for profiling newly transcribed
RNA molecules. We use this method to show that therapeutic compounds affecting
different epigenetic enzymes elicit distinct, compound specific responses mediated by
different sets of transcription factors already after one hour of treatment that can only
be detected when measuring newly transcribed RNA.
The goal of paper II is to determine how genetic variation shapes transcriptional bursting.
To this end, we infer transcriptome-wide burst kinetics parameters from genetically
distinct donors and find variation that selectively affects burst sizes and frequencies.
Paper III describes a method for inferring transcriptional kinetics transcriptome-wide
using single-cell RNA-sequencing. We use this method to describe how the regulation of
transcriptional bursting is encoded in the genome. Our findings show that gene specific
burst sizes are dependent on core promoter architecture and that enhancers affect burst
frequencies. Furthermore, cell type specific differential gene expression is regulated by
cell type specific burst frequencies.
Lastly, Paper IV shows how transcription shapes cell types. We collect data on cellular
morphologies, electrophysiological characteristics, and measure gene expression in the
same neurons collected from the mouse motor cortex. Our findings show that cells
belonging to the same, distinct transcriptomic families have distinct and non-overlapping
morpho-electric characteristics. Within families, there is continuous and correlated
variation in all modalities, challenging the notion of cell types as discrete entities
- …