668 research outputs found

    Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data

    Get PDF
    Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques. Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e Informátic

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Theoretical and computational tools to model multistable gene regulatory networks

    Full text link
    The last decade has witnessed a surge of theoretical and computational models to describe the dynamics of complex gene regulatory networks, and how these interactions can give rise to multistable and heterogeneous cell populations. As the use of theoretical modeling to describe genetic and biochemical circuits becomes more widespread, theoreticians with mathematics and physics backgrounds routinely apply concepts from statistical physics, non-linear dynamics, and network theory to biological systems. This review aims at providing a clear overview of the most important methodologies applied in the field while highlighting current and future challenges, and includes hands-on tutorials to solve and simulate some of the archetypical biological system models used in the field. Furthermore, we provide concrete examples from the existing literature for theoreticians that wish to explore this fast-developing field. Whenever possible, we highlight the similarities and differences between biochemical and regulatory networks and classical systems typically studied in non-equilibrium statistical and quantum mechanics.Comment: 73 pages, 12 figure

    The use of scRNA-seq to characterise the tumour microenvironment of high grade serous ovarian carincoma (HGSOC)

    Get PDF
    High Grade Serous Ovarian Carcinoma (HGSOC) is the most common type of ovarian cancer. Patients with this disease typically experience relapse in their disease following surgical debulking and initially effective chemotherapy. HGSOC has been intensely studied at the genomic and transcriptomic levels in efforts to advance knowledge of the biological mechanisms that drive the behaviour of this malignancy, and so that new treatment strategies may curb the disease progression relapse. This body of work contributes an optimised protocol for generating robust 10X scRNA-seq libraries from fresh and preserved HGSOC tissue, aiming to dissect the cellular heterogeneity of HGSOC’s Tumour microenvironment (TME). Through unsupervised clustering analysis, it uncovers distinct cellular communities, elucidates transcriptomic signatures across HGSOC tumours, and augments bulk RNA-seq datasets via computational deconvolution, enhancing understanding of HGSOC's cellular complexity across an expanded clinical cohort. The sequencing and analysis of these HGSOC patient tumours revealed 11 distinct cell types, including 2 that are novel in this tumour type; namely ciliated epithelial cells and metallothionein expressing T-cells. These 11 distinct cell types can be broadly categorised into 3 TME components (Tumour, Stroma and Immune) as in other previous tumour scRNA-seq studies. An additional analysis of these components examined the copy number variation (CNV) in the profiled cells and revealed HGSOC tumour cells to be mostly aneuploid while ciliated epithelial cells were diploid. A novel integrative subcluster analysis of HGSOC aneuploid tumour cells identified several apparently tumourigenic gene expression signatures. These include a KRT17+, protease inhibitory signature, an increased cellular metabolism signature, and an immune-reactive signature. Additionally, a ciliated cluster re-emerged within the HGSOC tumour cells, even though the diploid ciliated epithelial cells were not included in the integrative analysis. Finally, the high granularity of HGSOC cellular composition revealed by scRNA-seq is utilised to perform deconvolution analyses to estimate cellular proportions and infer the TME of earlier bulk RNA-seq profiled HGSOC tumour samples. This investigation of earlier sequenced HGSOC samples revealed heterogeneity in the proportions of the TME compartments across the patient cohorts. Survival analysis using these inferred cellular proportions suggest that immune cell presence alone is not associated with survival, but metastatic fibroblast burden in tumour samples is significantly associated with worsen overall survival in HGSOC patients. In conclusion, the laboratory protocol, the scRNA-seq datasets produced, and their analysis and application presented in this work expands the collective knowledge base of HGSOC. Specifically by characterising the cells of the HGSOC tumour microenvironment, and nuances of expression signatures of the malignant cells. The deconvolution approach showcases how scRNA-seq data can expand the clinical utility of earlier RNA-seq HGSOC datasets in a way that is scalable

    An optimal approach to the design of experiments for the automatic characterisation of biosystems

    Get PDF
    The Design-Build-Test-Learn cycle is the main approach of synthetic biology to re-design and create new biological parts and systems, targeting the solution for complex and challenging paramount problems. The applications of the novel designs range from biosensing and bioremediation of water pollutants (e.g. heavy metals) to drug discovery and delivery (e.g. cancer treatment) or biofuel production (e.g. butanol and ethanol), amongst others. Standardisation, predictability and automation are crucial elements for synthetic biology to efficiently attain these objectives. Mathematical modelling is a powerful tool that allows us to understand, predict, and control these systems, as shown in many other disciplines such as particle physics, chemical engineering, epidemiology and economics. Yet, the inherent difficulties of using mathematical models substantially slowed their adoption by the synthetic biology community. Researchers might develop different competing model alternatives in absence of in-depth knowledge of a system, consequently being left with the burden of with having to find the best one. Models also come with unknown and difficult to measure parameters that need to be inferred from experimental data. Moreover, the varying informative content of different experiments hampers the solution of these model selection and parameter identification problems, adding to the scarcity and noisiness of laborious-to-obtain data. The difficulty to solve these non-linear optimisation problems limited the widespread use of advantageous mathematical models in synthetic biology, broadening the gap between computational and experimental scientists. In this work, I present the solutions to the problems of parameter identification, model selection and experimental design, validating them with in vivo data. First, I use Bayesian inference to estimate model parameters, relaxing the traditional noise assumptions associated with this problem. I also apply information-theoretic approaches to evaluate the amount of information extracted from experiments (entropy gain). Next, I define methodologies to quantify the informative content of tentative experiments planned for model selection (distance between predictions of competing models) and parameter inference (model prediction uncertainty). Then, I use the two methods to define efficient platforms for optimal experimental design and use a synthetic gene circuit (the genetic toggle switch) to substantiate the results, computationally and experimentally. I also expand strategies to optimally design experiments for parameter identification to update parameter information and input designs during the execution of these (on-line optimal experimental design) using microfluidics. Finally, I developed an open-source and easy-to-use Julia package, BOMBs.jl, automating all the above functionalities to facilitate their dissemination and use amongst the synthetic biology community

    Deriving a mathematical framework for data-driven analyses of immune cell dynamics

    Get PDF
    Zelluläre Entscheidungen, wie z. B. die Differenzierung von T-Helferzellen (Th-Zellen) in spezialisierte Effektorlinien, haben großen Einfluss auf die Spezifität von Immunreaktionen. Solche Reaktionen sind das Ergebnis eines komplexen Zusammenspiels einzelner Zellen, die über kleine Signalmoleküle, so genannte Zytokine, kommunizieren. Die hohe Anzahl der Komponenten, sowie deren komplizierte und oft nichtlineare Interaktionen erschweren dabei die Vorhersage, wie bestimmte zelluläre Reaktionen erzeugt werden. Aus diesem Grund sind die globalen Auswirkungen der gezielten Beeinflussung einzelner Zellen oder spezifischer Signalwege nur unzureichend verstanden. So wirken beispielsweise etablierte Behandlungen von Autoimmunkrankheiten oft nur bei einem Teil der Patienten. Durch Einzelzellmethoden wie Live-Cell-Imaging, Massenzytometrie und Einzelzellsequenzierung, können Immunzellen heutzutage quantitativ auf mehreren Ebenen charakterisiert werden. Diese Ansammlung quantitativer Daten erlaubt die Formulierung datengetriebener Modelle zur Vorhersage von zellulären Entscheidungen, allerdings fehlen in vielen Fällen Methoden, um die verschiedenen Daten auf geeignete Weise zu integrieren und zu annotieren. Die vorliegende Arbeit befasst sich mit quantitativen Modellformulierungen für die Entscheidungsfindung von Zellen im Immunsystem mit dem Schwerpunkt auf Lymphozytenproliferation, -differenzierung und -tod.Cellular decisions, such as the differentiation of T helper (Th) cells into specialized effector lineages, largely impact the direction of immune responses. Such population-level responses are the result of a complex interplay of individual cells which communicate via small signaling molecules called cytokines. The system's complexity, stemming not only from the number of components but also from their intricate and oftentimes non-linear interactions, makes it difficult to develop intuition for how cellular responses are actually generated. Not surprisingly, the global effects of targeting individual cells or specific signaling pathways through perturbations are poorly understood. For instance, common treatments of autoimmune diseases often work for some patients, but not for others. Recently developed methods such as live-cell imaging, mass cytometry and single-cell sequencing now enable quantitative characterization of individual immune cells. This accumulating wealth of quantitative data has laid the basis to derive predictive, data-driven models of immune cell behavior, but in many cases, methods to integrate and annotate the data in a way suitable for model formulation are missing. In this thesis, quantitative workflows and methods are introduced that allow to formulate data-driven models of immune cell decision-making with a particular focus on lymphocyte proliferation, differentiation and death

    Development and application of NMR methods to study biomolecular dynamics

    Get PDF
    Structural biology has generated profound insights into biomolecular machines. The molecular basis of processes like binding, folding, catalysis and regulation, which underlie the inner working of living organisms would have largely remained unexplored without the thousands of structures that have been solved over the years. But these machines, formed by proteins and nucleic acids, are inherently dynamic, and information about this fourth dimension, the modulation of their structure with time, is often lacking. Nuclear magnetic resonance (NMR) is exquisitely suited to characterize dynamics over a wide timescale, from picoseconds, where amplitudes and correlation times can be extracted, to microsecond, milliseconds and seconds, where in favourable cases information about the kinetics, the thermodynamics and the structure of an excited state can be retrieved. With increasing size of the molecular system under consideration, however, this characterization is progressively challenging for NMR, and the analysis often focuses on 13CH3 spin systems in a perdeuterated background. As an alternative approach, fluorine NMR has grown in popularity. The 19F isotope can be introduced site-specifically, it gives rise to background-free one-dimensional spectra and the technique bypasses the need for perdeuteration. In my disseration, I expanded the existing toolkit of 19F NMR, applied 19F experiments that report on dynamics to high-molecular weight systems and combined their advantages with established methyl group NMR techniques. Development of 19F relaxation dispersion experiments To develop 19F relaxation dispersion (RD) experiments, I used a 7.5 kDa cold shock protein from the thermophilic organism Thermotoga maritima as a protein folding/unfolding model system. The global analysis of three RD experiments showed consistent results for the two-state exchange process. Our new rotating frame relaxation pulse sequences allowed to extract the absolute chemical shift of the unfolded state and significantly extended the range of timescales that can be assessed experimentally. Employing a 360 kDa double heptameric complex, I validated the applicability of the experiments on a highly challenging assembly. Conformational changes in the exoribonuclease Xrn2 The 5'-3' exoribonuclease Xrn2 operates in the nucleus in RNA processing and RNA turn-over pathways. Static structures of its cytoplasmic homologue Xrn1 in the presence of substrates implicate that the enzymes undergo conformational changes to progress through the catalytic cycle. Here, I solved the X-ray structure of Xrn2 from the thermophilic organism Chaetomium thermophilum to 3 Å resolution and combined methyl group and fluorine relaxation dispersion to characterize the exchange in a 100 kDa apo protein core construct in solution. Upon binding of a substrate, the conformational equilibrium is substantially shifted towards the active state. Importantly, the 19F experiments allowed to characterize dynamics in these unstable samples and I could show that the exchange of the enzyme:substrate complex are largely suppressed. Multi-site exchange in a neomycin-sensing riboswitch The existence of multiple sparsely populated states complicates the characterization of an exchanging system. Using a synthetic neomycin-binding riboswitch bound to different aminoglycoside ligands, I demonstrated that fluorine NMR can be employed to study exchange topologies with up to four states. To this end, I take advantage of an additional off-resonance technique, 19F chemical exchange saturation transfer. Combined with 19F RD and longitudinal exchange experiments, the results support the notion of a modular impact of aminoglycoside functional groups on the riboswitch dynamics. Taken together, these results expand and complement the NMR toolbox to study exchanging systems, with an emphasis on high-molecular weight systems and intricate exchange topologies involving more than two states. Furthermore, they elucidate the molecular dynamics in the 5'-3' exoribonuclease Xrn2 and provide a conceptional framework to study dynamics in related systems such as Xrn1

    Gene dynamics of maturation in endogenous and pluripotent stem cell-derived cardiomyocytes

    Get PDF
    A primary limitation in the clinical application of pluripotent stem cell-derived cardiomyocytes (PSC-CMs) is the failure of these cells to achieve full functional maturity. In vivo, cardiomyocytes undergo numerous adaptive changes during perinatal maturation. By contrast, PSC-CMs fail to fully undergo these developmental processes, instead remaining arrested at an embryonic stage of maturation. To date, however, the precise mechanisms by which directed differentiation differs from endogenous development, leading to consequent PSC-CM maturation arrest, are unknown. The advent of single cell RNA-sequencing (scRNA-seq) has offered great opportunities for studying CM maturation at single cell resolution. However, postnatal cardiac scRNA-seq has been limited owing to technical difficulties in the isolation of single CMs. Additionally, cross-study comparison is limited by dataset specific batch effects. In this dissertation, I first established large particle fluorescence-activated cell sorting (LP-FACS) for isolation of viable single adult CMs. I secondly developed transcriptomic entropy as a robust, batch effect-resistant approach to quantifying CM maturation. With these and other computational tools, I investigated gene expression trends in endogenous and PSC-derived CMs. I first generated an scRNA-seq reference of mouse in vivo CM maturation with extensive sampling of perinatal time periods. I subsequently generated isogenic embryonic stem cells and created an in vitro scRNA-seq reference of PSC-CM directed differentiation. Through computational analysis, I identified a perinatal iimaturation program in endogenous CMs that is poorly recapitulated in vitro. By comparison of these trajectories with previously published human datasets, I identified a network of nine transcription factors (TFs) whose targets are consistently dysregulated in PSC-CMs across species. Notably, I demonstrated that these TFs are only partially activated in common ex vivo approaches to engineer PSC-CM maturation. This dissertation represents the first direct comparison of CM maturation in vivo and in vitro at the single cell level. Moreover, the findings and tools developed here can be leveraged towards improving the clinical viability of PSC-CMs
    corecore