11 research outputs found
Cautionary Tales of Inapproximability
Modeling biology as classical problems in computer science allows researchers to leverage the wealth of theoretical advancements in this field. Despite countless studies presenting heuristics that report improvement on specific benchmarking data, there has been comparatively little focus on exploring the theoretical bounds on the performance of practical (polynomial-time) algorithms. Conversely, theoretical studies tend to overstate the generalizability of their conclusions to physical biological processes. In this article we provide a fresh perspective on the concepts of NP-hardness and inapproximability in the computational biology domain, using popular sequence assembly and alignment (mapping) algorithms as illustrative examples. These algorithms exemplify how computer science theory can both (a) lead to substantial improvement in practical performance and (b) highlight areas ripe for future innovation. Importantly, we discuss caveats that seemingly allow the performance of heuristics to exceed their provable bounds
Tailoring bioinformatics strategies for the characterization of the human microbiome in health and disease
The human microbiome is a very active area of research due to its potential to explain
health and disease. Advances in high throughput DNA sequencing in the last decade have
catalyzed the growth of microbiome research; DNA sequencing allows for a cost-effective
method to characterize entire microbial communities directly, including unculturable
microbes which were previously difficult to study. 16S rRNA sequencing and shotgun
metagenomics, coupled with bioinformatics methods have powered the characterization of
the human microbiome in different parts of the body. This has led to the discovery of novel
links between the microbiome and diseases such as allergies, cancer, and autoimmune
diseases.
This thesis focuses on the application of both 16S rRNA sequencing and shotgun
metagenomics for the characterization of the human microbiome and its relationship with
health and disease. We established two methodologies to address these questions. The first
methodology is a bench-to-bioinformatics pipeline to discover putative viral pathogens
involved in disease using shotgun metagenomics technology. In paper I, we apply the
proposed pipeline to explore the hypothesis of viral infection as a putative cause of
childhood Acute Lymphoblastic Leukemia. In paper II, we propose a complementary
method to the pipeline to improve the detection of unknown viruses, especially those with
little or no homology to currently known viruses. We applied this method on a collection of
viral-enriched libraries which resulted in the characterization of a new viral-like genome.
The second methodology was developed to explore and generate hypothesis from a human
skin microbiome dataset of Psoriasis and Atopic Dermatitis patients. The results of the
analysis are presented in Paper III and Paper IV. Paper III is a pure data-driven exploration
of the dataset to discover different aspects on how the microbiome is linked to both
diseases. Paper IV follows up from the results of paper III but focuses on characterizing
the skin site microbiome variability in Atopic Dermatitis
Analysis of bronchoalveolar lavage transcriptome profiles of asthmatic horses by single-cell mRNA sequencing
Severe equine asthma (SEA) is a common respiratory condition of horses, whose underlying immune mechanisms remain to be elucidated. In this thesis project, we took advantage of the recently developed single-cell mRNA (scRNA-seq) technology to investigate the immunological landscape of equine bronchoalveolar lavage fluid (BALF) cells in both health and disease. Initially, we conducted a pilot experiment involving three horses to demonstrate the feasibility of scRNA-seq on cryopreserved equine BALF samples. Although the experiment was successful, the proportion of reads aligning to the annotated equine reference transcriptome was suboptimal. To address this, we generated a custom equine BALF transcriptome using long-read sequencing, aiming to improve the quality of 3'-UTR annotation and document BALF-specific isoforms. While we identified several novel isoforms, the read mapping percentage did not improve when aligning our scRNA-seq transcripts to the custom transcriptome. By extending the 3'-UTRs of the existing reference annotation, we achieved a satisfactory read mapping percentage, enabling subsequent qualitative downstream analysis. Our scRNA-seq dataset encompassed six major cell populations: monocytes-macrophages, neutrophils, T cells, B cells and dendritic cells. Within the monocyte-macrophage and T cell groups, we identified previously uncharacterized cell subtypes. Encouraged by these findings, we applied our optimized experimental protocol and analysis pipeline to study SEA. ScRNA-seq analysis of cryopreserved BALF cells from 6 asthmatic and 5 healthy controls revealed the same major cell populations as observed in the pilot study. In addition to T cells and monocytes-macrophages, we characterized several cell subtypes within the B cell, dendritic cell and neutrophil populations. Differential gene expression analysis revealed a strong T helper (Th)17 signature in SEA, primarily driven by monocytes-macrophages and T cells. Notably, BALF from SEA horses was enriched in B cells, with a lower proportion of activated plasma cells. Neutrophils in the SEA group displayed increased migratory capacity and a heightened propensity to form neutrophil extracellular traps (NETs). An intriguing finding in both scRNA-seq experiments was the detection of a dual monocyte-lymphocyte population, potentially representing genuine cellular complexes engaged in an immunological synapse. In summary, this thesis project represents pioneering work employing scRNA-seq in the field of equine pulmonology. Our findings support a predominant Th17 immune pathway in SEA, necessitating further investigation to improve diagnostic tools and therapeutic management of severely asthmatic horses
Role of the antagonistic histone methylation marks H3K4me3 and H3K27me3 in the cold stress response of Arabidopsis thaliana
As sessile organisms, plants need to adapt to their changing environment, including temperature fluctuations. As low temperatures can have major noxious consequences on their development and survival, plants need to establish the proper defences in order to endure the stress. This requires a massive and very fast transcriptome reprogramming involving, among others, the induction of hundreds of cold-responsive (COR) genes. Following the immediate response to chilling stress, plants are also able to memorize cold spells, leading to an improved survival during a second stress episode. This process is associated with a revised transcriptomic response also called transcriptional memory. Overall, both the response to cold and the memory of this stress rely on the tight transcriptional regulation of the COR genes. While numerous transcription factors necessary for their induction were already identified, the role of chromatin modifications in this process remains largely undiscovered. As the combination of chromatin modifications (the “chromatin state”) is a key determinant of gene expression, this study aimed at uncovering the potential role of histone modifications in the transcriptional regulation of COR genes before, during and after a cold episode. First, a comprehensive in silico analysis of the chromatin state of COR genes prior to any cold occurrence revealed that a majority of those genes carry both the activating mark H3K4me3 and the silencing mark H3K27me3, forming a specific chromatin state called bivalency. The in vivo characterization of bivalent genes revealed that this chromatin state decorates not only cold-inducible genes but numerous reversibly silenced stress-responsive genes and might poise them for expression by maintaining them in an open chromatin conformation. Furthermore, the putative bivalency reader DEK2 was shown to prevent the over-induction of bivalent COR genes during a cold episode, suggesting that bivalency can also participate in transcriptional regulation in trans through the action of specific readers. In a second stage, the dynamics of H3K4me3 and H3K27me3 during a cold stress were analysed using genome-wide approaches, revealing that both marks underwent intensive redistribution already after three hours of low temperature. Those changes partially correlated with expression changes: in particular, the induction of COR genes was associated with a loss of the repressive mark H3K27me3 or a gain of the activating mark H3K4me3. However, each mark displayed different targets and dynamics, suggesting that they hold distinct roles in the cold response: H3K4me3 associated with immediate stress responses while H3K27me3 rather correlated with longer-term adaptation. Upon return to ambient temperature, the cold-induced variations reverted at a different pace depending on the gene and some changes were maintained for up to seven days. Both the maintenance of H3K4me3 and H3K27me3 changes were linked to transcriptional memory: higher levels of H3K4me3 were associated with sustained induction while lower levels of H3K27me3 were correlated with a faster re-induction during a second stress exposure. Finally, the H3K27me3 demethylase ELF6 was shown to be essential for cold stress memory. This led to the hypothesis that cold stress memory might rely on the maintained loss of H3K27me3 on specific COR genes, allowing a faster re-establishment of defences during a second stress episode. In conclusion, this study demonstrates that the antagonistic marks H3K4me3 and H3K27me3 jointly participate to the transcriptional regulation of COR genes and reveals a new role of bivalency in the plant cold stress response and memory.Da Pflanzen an ihren Standort gebunden sind, müssen sie sich ständig an veränderte Umweltbedingungen anpassen. Kälte kann schädliche Folgen für die Pflanzenentwicklung und sogar zum Absterben von Pflanzen führen. Daher müssen Pflanzen auf diesen Stress reagieren, indem sie eine geeignete Abwehr zum Überleben aufbauen. Dies erfordert eine erhebliche Umprogrammierung des Transkriptoms, welche die Induktion von zahlreichen kälteempfindlichen (COR) Genen enthält. Nach einer direkten Stressantwort sind Pflanzen in der Lage ein Kältegedächtnis aufzubauen, wodurch sie eine zweite Kälteepisode besser überstehen. Dieser Prozess ist mit massiven Änderungen auf Genexpressionsebene verbunden, die auch „transkriptionelles Gedächtnis“ genannt wird. Sowohl die unmittelbare Reaktion auf, als auch das Bilden eines längerfristigen Gedächtnisses an Kälte, sind auf eine präzise Transkriptionsregulation von COR Genen angewiesen. Obwohl der Chromatinzustand ein bestimmender Faktor für Genexpression ist, ist die Rolle von Chromatinmodifikationen in der Induktion von COR Genen noch weitgehend unbekannt. Deshalb war es das Ziel dieser Arbeit, die Rolle von Histonmodifikationen in der Transkriptionsregulation von COR Genen vor, während, und nach Kältestress zu analysieren. Zunächst offenbarte eine umfassende in silico Analyse des Chromatinzustands von COR Genen vor einem Kälteereignis, dass die Mehrheit dieser Gene sowohl die aktivierende Modifikation H3K4me3 als auch die repressive Modifikation H3K27me3 tragen. Dieser Chromatinzustand wird auch als bivalent bezeichnet. Die in vivo Charakterisierung bivalenter Gene zeigte, dass besonders stillgelegte, induzierbare Gene durch einen bivalenten Chromatinzustand markiert sind. Diese könnten dadurch für eine eventuelle Expression vorbereitet sein, indem diese Genbereiche in einer offenen Chromatin-Konformation verbleiben. Der vermeintliche Bivalenz-Leser DEK2 verhinderte die Überinduktion von bivalenten Genen während einer Kälteepisode. Dies weist darauf hin, dass Bivalenz auch an der Transkriptionsregulation in trans durch die Aktion von bestimmten Reader-Proteinen Anteil nehmen kann. Die Analyse der H3K4me3 und H3K27me3 Dynamik mittels genomweiter Methoden zeigte, dass bei niedrigen Temperaturen eine intensive Neuverteilung beider Modifikationen stattfindet, die teilweise mit Expressionsvariationen korrelierte. Insbesondere war die Induktion von COR Genen mit einem Verlust der repressiven Modifikation H3K27me3 oder einer Zunahme der aktivierenden Modifikation H3K4me3 assoziiert. Die Modifikationen haben jedoch distinkte Rollen in der Kälteantwort. H3K4me3 war mit der unmittelbaren Stressantwort assoziiert, während H3K27me3 eher mit Langzeitadaptation korreliert war. Nach der Rückkehr zu ambienter Umgebungstemperatur kehrte das Chromatin zu seinem Ausgangzustand in unterschiedlichem Tempo abhängig vom Gen zurück, wobei manche Veränderungen bis zu sieben Tage beibehalten wurden. Die Aufrechterhaltung von sowohl H3K4me3 als auch H3K27me3 Variationen waren mit transkriptionellem Gedächtnis assoziiert: höhere H3K4me3 Mengen korrelierten mit beständiger Induktion und niedrigere H3K27me3 Mengen waren mit einer schnelleren Re-Induktion während einer zweiten Kälteepisode assoziiert. Schließlich wurde gezeigt, dass die H3K27me3 Demethylase ELF6 unabdingbar für das Kältestressgedächtnis ist. Der fortbestehende Verlust von H3K27me3 auf spezifischen Genen könnte daher die molekulare Basis für das Kältestressgedächtnis sein, indem ein schnellerer Wiederaufbau der Abwehr während einer zweiten Stressepisode ermöglicht wird. Insgesamt zeigt diese Studie, dass die antagonistischen Modifikationen H3K4me3 und H3K27me3 gemeinsam an der Transkriptionsregulation von COR Genen teilhaben, und offenbart eine neue Rolle der Bivalenz bei der Kältestressreaktion von Pflanzen
Recommended from our members
Investigating the spatial regulation of meiotic recombination in S. cerevisiae
In order for a species to engage in and reap the evolutionary benefits of sexual reproduction, a subset of cells in each individual must undergo a complex ordeal known as meiosis—a specialised cell division. By halving the genome content and “shuffling the deck”, meiosis generates genetically diverse haploid gametes (eggs, sperm) or spores from diploid cells. Such a monumental task is by no means easy or risk free: during the meiotic programme, cells intentionally damage their own genomes through widespread induction of DNA double-strand breaks (DSBs) in order to initiate homologous recombination—a DNA-repair process—and subsequent crossover (CO) formation. The success of meiosis is, however, not left up to chance. Rather, a complicated web of regulation acts at multiple stages to ensure this dangerous tradeoff pays dividends. Notably, the spatial pattern of meiotic recombination across the genome is complex and non-random. Whilst ultimately stochastic in nature, recombination events within any given meiotic cell display relatively even distributions along each chromosome—a phenomenon mediate by processes of “interference” acting at two key stages in meiosis: DSB and CO formation. Despite wide ranging historical observation, relatively little is known about how either form of interference is accomplished. Genome-wide mapping of recombination within S. cerevisiae has, however, provided a unique opportunity to investigate the underlying mechanisms. By computationally and mathematically analysing genome-wide data, work presented throughout this thesis seeks to: (i) investigate CO distribution and CO interference within various DNA damage response and DNA repair mutants (Tel1ATM, Mec1ATR, Rad24, Msh2) (Chapter 2) (ii) develop novel approaches to DSB mapping (Chapter 3) (iii) characterise the hyperlocal regulation of DSB formation (Chapter 3) and (iv) examine the mechanics of DSB interference (Chapter 4). Moreover, widely applicable simulation platforms for investigating DSB and CO formation have been developed (Chapter 2, 4). Collectively, this thesis further elucidates the mechanisms that underpin the spatial regulation of meiotic recombination in S. cerevisiae
Desarrollo de técnicas bioinformáticas para el análisis de datos de secuenciación masiva en sistemática y genómica evolutiva: Aplicación en el análisis del sistema quimiosensorial en artrópodos
[spa] Las tecnologías de secuenciación de próxima generación (NGS) proporcionan datos potentes para investigar cuestiones biológicas y evolutivas fundamentales, como estudios relacionados con la genómica evolutiva de la adaptación y la filogenética. Actualmente, es posible llevar a cabo proyectos genómicos complejos analizando genomas completos y / o transcriptomas, incluso de organismos no modelo.
En esta tesis, hemos realizado dos estudios complementarios utilizando datos NGS. En primer lugar, hemos analizado el transcriptoma (RNAseq) de los principales órganos quimiosensoriales del quelicerado Macrothele calpeiana, Walckenaer, 1805, la única araña protegida en Europa, para investigar el origen y la evolución del sistema quimiosensorial (SQ) en los artrópodos. El SQ es un proceso fisiológico esencial para la supervivencia de los organismos, y está involucrado en procesos biológicos vitales, como la detección de alimentos, parejas o depredadores y sitios de ovoposición. Este sistema, está relativamente bien caracterizado en hexápodos, pero existen pocos estudios en otros linajes de artrópodos. El análisis de nuestro transcriptoma permitió detectar algunos genes expresados en los supuestos órganos quimiosensoriales de los quelicerados, como cinco NPC2 y dos IR. Además, también detectamos 29 tránscritos adicionales después de incluir en los perfiles de HMM nuevos miembros del SQ de genomas de artrópodos recientemente disponibles, como algunos genes de las familias de los SNMP, ENaC, TRP, GR y una OBP-like. Desafortunadamente, muchos de ellos eran fragmentos parciales.
En segundo lugar, también hemos desarrollado algunas herramientas bioinformáticas para analizar datos de RNAseq y desarrollar marcadores moleculares. Los investigadores interesados en la aplicación biológica de datos NGS pueden carecer de la experiencia bioinformática requerida para el tratamiento de la gran cantidad de datos generados. En este contexto, principalmente, es necesario el desarrollo de herramientas fáciles de usar para realizar todos los procesos relacionados con el procesamiento básico de datos NGS y la integración de utilidades para realizar análisis posteriores. En esta tesis, hemos desarrollado dos herramientas bioinformáticas con interfaz gráfica, que permite realizar todos los procesos comunes del procesamiento de datos NGS y algunos de los principales análisis posteriores: i) TRUFA (TRanscriptome User-Friendly Analysis), que permite analizar datos RNAseq de organismos que no modelos, incluyendo la anotación funcional y el análisis de expresión génica diferencial; y ii) DOMINO (Development Of Molecular markers In Non-model Organisms), que permite identificar y seleccionar marcadores moleculares apropiados para análisis de biología evolutiva. Estas herramientas han sido validadas utilizando simulaciones por ordenador y datos experimentales, principalmente de arañas.[eng] The Next Generation Sequencing (NGS) technologies are providing powerful data to investigate fundamental biological and evolutionary questions including phylogenetic and adaptive genomic topics. Currently, it is possible to carry out complex genomic projects analyzing the complete genomes and/or transcriptomes even in non-model organisms.
In this thesis, we have performed two complementary studies using NGS data. Firstly, we have analyzed the transcriptome (RNAseq) of the main chemosensory organs of the chelicerate Macrothele calpeiana, Walckenaer, 1805, the only spider protected in Europe, to investigate the origin and evolution of the Chemosensory System (CS) in arthropods. The CS is an essential physiological process for the survival of organisms, and it is involved in vital biological processes, such as the detection of food, partners or predators and oviposition sites. This system, which has it relatively well characterized in hexapods, is completely unknown in other arthropod lineages. Our transcriptome analysis allowed to detect some genes expressed in the putative chemosensory organs of chelicerates, such as five NPC2s and two IRs. Furthermore, we detected 29 additional transcripts after including new CS members from recently available genomes in the HMM profiles, such as the SNMPs, ENaCs, TRPs, GRs and one OBP-like. Unfortunately, many of them were partial fragments.
Secondly, we have also developed some bioinformatics tools to analyze RNAseq data, and to develop molecular markers. Researchers interested in the biological application of NGS data may lack the bioinformatic expertise required for the treatment of the large amount of data generated. In this context, the development of user-friendly tools for common data processing and the integration of utilities to perform downstream analysis is mostly needed. In this thesis, we have developed two bioinformatics tools with an easy to use graphical interface to perform all the basics processes of the NGS data processing: i) TRUFA (TRanscriptome User-Friendly Analysis), that allows analyzing RNAseq data from non-model organisms, including the functional annotation and differential gene expression analysis; and ii) DOMINO (Development of Molecular markers in Non-model Organisms), which allows identifying and selecting molecular markers appropriated for evolutionary biology analysis. These tools have been validated using computer simulations and experimental data, mainly from spiders
Exploring interactions between host and gut microbiota in ulcerative colitis and primary sclerosing cholangitis associated inflammatory bowel disease: An appraisal through faecal microbiota transplantation and systems biology
Inflammatory bowel disease (IBD) has progressively become a global epidemic and now affects nearly 0.5% of the Western population. The aetiological factors that initiate and drive mechanisms associated with IBD remain unclear. A cure has been even more elusive. Changes in the gut microbial diversity and profiles in individuals with this disease is a characteristic feature, however a causal relationship has yet to be proven. In my PhD I have attempted to explore host-microbiota interactions and its influence on mechanisms of ulcerative colitis (UC) and primary sclerosing cholangitis associated inflammatory bowel disease (PSC-IBD).
Patients with UC have a greater abundance of Clostridiaceae at inflamed compared to non-inflamed sites. Immunophenotyping demonstrated significantly higher proportions of colonic mucosal Th17 and IL-17 producing CD4 cells in patients with UC and PSC-IBD compared to healthy controls. Through an open label study (STOP-Colitis pilot phase), I demonstrated that faecal microbiota transplantation (FMT) resulted in a clinical response in 47% of patients (8/17; intention to treat). This response was associated with a significant increase in colonic mucosal regulatory T cells (Treg), effector memory Tregs, gut homing Tregs and IL-10 producing CD4 T cells population along with a concurrent decrease in Th17, IL-17 producing CD4 T cells and CD8 populations. Colonic mucosal transcriptomics revealed that responders to FMT had significant downregulation of antimicrobial defence and proinflammatory immunological pathways and an increase in butanoate metabolic pathways compared to both baseline and non-responders. Finally, through a multi-omic exploration of colonic mucosal biology, I demonstrated that the gene expression profiles in patients with PSC-IBD was significantly different to UC and was associated with dysregulation of bile acid homeostasis and signalling in association with colonic dysbiosis