84 research outputs found

    Statistical methods in detecting differential expressed genes, analyzing insertion tolerance for genes and group selection for survival data

    Get PDF
    The thesis is composed of three independent projects: (i) analyzing transposon-sequencing data to infer functions of genes on bacteria growth (chapter 2), (ii) developing semi-parametric Bayesian method method for differential gene expression analysis with RNA-sequencing data (chapter 3), (iii) solving group selection problem for survival data (chapter 4). All projects are motivated by statistical challenges raised in biological research. The first project is motivated by the need to develop statistical models to accommodate the transposon insertion sequencing (Tn-Seq) data, Tn-Seq data consist of sequence reads around each transposon insertion site. The detection of transposon insertion at a given site indicates that the disruption of genomic sequence at this site does not cause essential function loss and the bacteria can still grow. Hence, such measurements have been used to infer the functions of each gene on bacteria growth. We propose a zero-inflated Poisson regression method for analyzing the Tn-Seq count data, and derive an Expectation-Maximization (EM) algorithm to obtain parameter estimates. We also propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant, and hyper-tolerant, while controlling false discovery rate. Simulation studies show our method provides good estimation of model parameters and inference on gene functions. In the second project, we model the count data from RNA-sequencing experiment for each gene using a Poisson-Gamma hierarchical model, or equivalently, a negative binomial (NB) model. We derive a full semi-parametric Bayesian approach with Dirichlet process as the prior for the fold changes between two treatment means. An inference strategy using Gibbs algorithm is developed for differential expression analysis. We evaluate our method with several simulation studies, and the results demonstrate that our method outperforms other methods including the popularly applied ones such as edgeR and DESeq. In the third project, we develop a new semi-parametric Bayesian method to address the group variable selection problem and study the dependence of survival outcomes on the grouped predictors using the Cox proportional hazard model. We use indicators for groups to induce sparseness and obtain the posterior inclusion probability for each group. Bayes factors are used to evaluate whether the groups should be selected or not. We compare our method with one frequentist method (HPCox) based on several simulation studies and show that our method performs better than HPCox method. In summary, this dissertation tackles several statistical problems raised in biological research, including high-dimensional genomic data analysis and survival analysis. All proposed methods are evaluated with simulation studies and show satisfactory performances. We also apply the proposed methods to real data analysis

    Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression

    Get PDF
    BACKGROUND: Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. RESULTS: In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. CONCLUSIONS: Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model

    TnseqDiff: identification of conditionally essential genes in transposon sequencing studies

    Full text link
    Abstract Background Tn-Seq is a high throughput technique for analysis of transposon mutant libraries to determine conditional essentiality of a gene under an experimental condition. A special feature of the Tn-seq data is that multiple mutants in a gene provides independent evidence to prioritize that gene as being essential. The existing methods do not account for this feature or rely on a high-density transposon library. Moreover, these methods are unable to accommodate complex designs. Results The method proposed here is specifically designed for the analysis of Tn-Seq data. It utilizes two steps to estimate the conditional essentiality for each gene in the genome. First, it collects evidence of conditional essentiality for each insertion by comparing read counts of that insertion between conditions. Second, it combines insertion-level evidence for the corresponding gene. It deals with data from both low- and high-density transposon libraries and accommodates complex designs. Moreover, it is very fast to implement. The performance of the proposed method was tested on simulated data and experimental Tn-Seq data from Serratia marcescens transposon mutant library used to identify genes that contribute to fitness in a murine model of infection. Conclusion We describe a new, efficient method for identifying conditionally essential genes in Tn-Seq experiments with high detection sensitivity and specificity. It is implemented as TnseqDiff function in R package Tnseq and can be installed from the Comprehensive R Archive Network, CRAN.https://deepblue.lib.umich.edu/bitstream/2027.42/137673/1/12859_2017_Article_1745.pd

    Fitness Landscape of the Fission Yeast Genome

    Get PDF
    The relationship between DNA sequence, biochemical function and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in non-coding regions, particularly in eukaryote genomes. In part, this is because we lack a complete description of the essential non-coding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the non-coding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3' and 5' untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary and biochemical data can provide new insights into the relationship between genome function and molecular evolution

    Fitness Landscape of the Fission Yeast Genome.

    Get PDF
    The relationship between DNA sequence, biochemical function, and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in noncoding regions, particularly, in eukaryote genomes. In part, this is because we lack a complete description of the essential noncoding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence, and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the noncoding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3'- and 5'-untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary, and biochemical data can provide new insights into the relationship between genome function and molecular evolution

    A single-cell multi-omic approach to the analysis of T cell differentiation

    Full text link
    This thesis aims to investigate T cell differentiation through the bioinformatic analysis of single-cell multi-omic data. T cells are an important part of the adaptive immune system, involved in the immune response to infections and cancer. Single-cell technologies have advanced to the point where multiple modes of data, such as gene and protein expression, can be assayed on the same cells. Greater understanding of T cell differentiation pathways at the single-cell level can help in the design of immunotherapies to treat cancer and autoimmune disease. The thesis begins by presenting a multi-omic workflow that combines scRNA-seq and T cell receptor (TCR) sequence extraction. The principles developed for this workflow were applied to investigate T cell differentiation in two scenarios. The first scenario was an application of single-cell multi-omics to the study of CD8+ T cell peripheral tolerance mechanisms in a mouse model. This work demonstrated that tolerance is a distinct differentiation program to functional effector responses, and T cells progressively commit to the tolerised state over the first 60hrs post exposure to triggering antigen. A gene signature for the tolerised state was identified, containing genes uniquely upregulated in tolerised cells. Quiescent and Proliferating clusters were found in tolerised cells, indicating that a proportion of cells exit cell cycle within each division. The second scenario was an investigation of the differentiation of CD4+ CAR T cells in vivo, and the evolution of a lymphoma derived from these cells. Three cell types, proliferating, cytotoxic and resting, were observed within the malignant CAR T-cells, and these types were also observed within non-malignant CAR T and endogenous CD4+ T cells. The lymphoma was characterised by expression of the NF-ÎșB transcription factor in all three cell types, while each cell type had differing expression levels for several other known oncogenes. This thesis has contributed to the understanding of T cell differentiation in tolerance and CAR T therapy, and has helped meet the challenge of increasingly large and complex single- cell datasets through the development of bioinformatic workflows to integrate samples from multiple patients and sequencing technologies, and integrate gene, protein, TCR sequence, cell division count and somatic mutation data at the single-cell level

    Pemodelan Indikator Kejadian Penyakit Tetanus Neonatorum Pada Bayi Di Indonesia Menggunakan Regresi Zero-Inflated Poisson

    Get PDF
    Salah satu penyebab tingginya Angka Kematian Bayi (AKB) di Indonesia adalah penyakit Tetanus Neonatorum. Penyakit tersebut disebabkan oleh infeksi bakteri Clostridium tetani yang menyerang pada bayi usia ke-3 dan 28 setelah kelahiran melalui pemotongan tali pusat dengan alat yang tidak steril. Jumlah kasus Tetanus Neonatorum merupakan data count atau jumlahan dengan asumsi mengikuti distribusi Poisson, sehingga akan dianalisis dengan menggunakan regresi Poisson. Namun regresi Poisson tidak sesuai apabila digunakan pada kasus Tetanus Neonatorum yang observasinya terdapat nilai nol yang banyak, karena akan melanggar asumsi equidispersi. Untuk itu pada penelitian ini, akan digunakan metode regresi lainnya yaitu Zero-Inflated Poisson (ZIP) karena proporsi nilai nol pada data sangat besar yaitu 57,6 persen. Data yang digunakan dalam penelitian tugas akhir ini adalah data sekunder yang diambil dari Profil Kesehatan Indonesia tahun 2013. Dari hasil penelitian diperoleh bahwa pemodelan dengan regresi Poisson terindikasi terjadi overdispersi, sehingga kasus tersebut diselesaikan dengan regresi Zero-Inflated Poisson. Pada model regresi Zero-Inflated Poisson (ZIP) variabel prediktor yang mempengaruhi poisson state adalah variabel persentase cakupan imunisasi TT2+ terhadap jumlah ibu hamil, persentase jumlah tenaga kesehatan terhadap jumlah bayi, dan persentase penolong persalinan dilakukan oleh tenaga kesehatan. Sedangkan variabel prediktor yang mempengaruhi zero state adalah variabel persentase jumlah tenaga kesehatan terhadap jumlah bayi

    Environmental activity of the inoculant Pseudomonas veronii 1YdBTEX2 during bioaugmentation in natural and polluted soils

    Get PDF
    SUMMARY Bioaugmentation uses the capacities of specific bacterial strains inoculated into sites to enhance pollutant biodegradation. Bioaugmentation mainly involves introducing bacteria that deploy their metabolic properties and adaptation potential to survive and propagate in the contaminated environment using the target pollutant as a carbon, nutrient or energy source. Most of our understanding of biodegradation activity comes from experiments carried out under laboratory conditions. However, attempts to demonstrate the potential for bioaugmentation of biodegraders under in situ conditions have shown both success and failures, without clear understanding of the underlying reasons or mechanisms. We hypothesized that a better understanding is necessary of the processes occurring during the invasion of inoculants into complex environments, such as polluted soils. Only then can we identify the bottlenecks for bioaugmentation and fully exploit the natural diversity of biodegrader candidates. In this context, the overall aim of this work was to rationally study the soil survival capacity and contrast the metabolic and physiological strategies of the BTEX-degrading bacterium Pseudomonas veronii 1YdBTEX2 during adaptation and invasion to natural non-sterile and contaminated soils, with that in polluted liquid suspended cultures or standardized porous materials. In the general introduction, I present the state of the art in soil bioremediation and bioaugmentation, and degradation of BTEX compounds in particular. In the first research chapter, I focus on understanding the adaptive response of P. veronii 1YdBTEX2 during transition from liquid batch culture to contaminated soil. For this, I analyzed the short-term (1 h) changes in genome-wide gene expression in non-sterile sand compared to liquid medium, both in presence or absence of toluene. In a collaborative effort, we improved the genome sequence of P. veronii 1YdBETX2, and showed that it actually covers three individual replicons with a total size of 8 Mb, with a large proportion of genes with unknown functions. One-hour exposure to toluene, both in soil and liquid, triggered massive transcription (up to 208-fold induction) of multiple gene clusters, such as toluene degradation pathway(s), chemotaxis, and toluene efflux pumps. This clearly underlines their key role in the adaptive response to toluene. In comparison to liquid medium, cells in soil drastically changed the expression of genes involved in membrane functioning (e.g., lipid composition, lipid metabolism, cell fatty acid synthesis), osmotic stress response (e.g., polyamine or trehalose synthesis, uptake of potassium), and putrescine metabolism, highlighting the adaptive mechanisms of P. veronii to readjust its metabolism to sand. In the next chapter, I focused on understanding the potential changes in the P. veronii global transcriptome during actual growth and survival (stationary phase) in contaminated soils. In addition, I aimed to investigate differences in behaviour among three different non-sterile soils and a historically contaminated material with (polycyclic) aromatic hydrocarbons. I showed that P. veronii established itself in all except one soil, at the expense of toluene. The strain also grew in its absence of toluene but to a lower population density. This indicates its capacity to survive in real field conditions under contamination stress. I compared the transcriptomic responses during transition phase, exponential growth and stationary phase among the soils and to the behaviour induced in regular liquid culture or inert silica matrix. The transcriptomic analysis revealed that P. veronii displayed a versatile global program, characterized by commonly shared and soil-specific strategies, and signs of nutrient limitations in later growth phases, which explain its behaviour in the soil environments. We also observed a strikingly conserved core metabolic expression in the exponential phase irrespective of the growth environment. The non-growth in one soil that carried the highest resident background microbial community may be the result of predation by higher loads of resident protists, and/or of possible competition for substrate by resident microbes. In the absence of sizeable P. veronii population, however, I did not manage to isolate sufficient RNA for transcriptomics, and signs of predatory or competition stress could thus not be uncovered. Finally, in the third chapter, I focused on characterizing the genes that might be important for growth and survival of P. veronii in (contaminated) soils. For this I used a transposon-insertion scanning approach. With help I generated two independent genome-wide insertion libraries of P. veronii through conjugation of a non-replicating hyperactive mini-Tn5 delivering an aph gene insertion (Kanamycin resistance gene). I inoculated the libraries then either in liquid suspended culture media or in two different soils under presence of toluene. Libraries were propagated for 50 generations in the same environments. Deep sequencing of amplified boundaries between the inserted aph- and host genes, served to quantify the relative abundances of insertions over time and condition. Fitness importance of genes for soil growth and survival was assessed from de- or increases of the relative abundance of insertions over generation times, and compared to random insertion models. Apart from typically considered ‘essential’ genes that were mostly absent already in the starting libraries, our data showed only a small number of genes implicated in fitness loss, that were overlapping between both soils and absent from liquid growth. These pointed to crucial functions of urea and short-chain fatty acid metabolism, oxidative stress defense, and nutrient/element transport for soil growth. Our analysis also showed that mutants inactivating flagella biosynthesis and motility had a strong fitness gain in soils. Long-term soil growth and survival of P. veronii in soils is thus a function of a variety of independent genes, but not necessarily of a coherent ‘soil-specific’ growth program. The results obtained in my thesis enabled to draw a global picture of the natural behavior, strategies, and capabilities of P. veronii to grow and survive under environmentally relevant conditions, such as posed by non-sterile and polluted soils. By comparing transcriptomic responses in different soils and materials, together with the transposon mutagenesis scanning at different growth phases, I am confident to have covered a broad range of conditions and scenarios that help our understanding of the mechanisms necessary for strain adaptation and survival upon inoculation. The integration of the results obtained throughout the above-described chapters showed that P. veronii is a robust strain. Its fast adaptability is explained by the large and redundant genome, which encodes several mechanisms to maintain its main central core metabolism despite the environment it is inoculated and to adjust efficiently to the constantly changing soil conditions. Finally, this work also shows evidence of possible reasons for which P. veronii would not be able to invade complex microbiomes. RÉSUMÉ La bioaugmentation utilise les capacitĂ©s de certaines bactĂ©ries inoculĂ©es dans un site contaminĂ© pour amĂ©liorer la dĂ©gradation de polluants. Ce procĂ©dĂ© implique l’introduction des bactĂ©ries qui peuvent exercer leurs propriĂ©tĂ©s mĂ©taboliques et d’adaptation pour survivre et prospĂ©rer dans le site contaminĂ© en utilisant le polluant comme source de carbone, nutriment et Ă©nergie. La plupart des connaissances Ă  propos de l’activitĂ© de biodĂ©gradation proviennent des expĂ©riences menĂ©es en conditions de laboratoire. Cependant, les tentatives pour dĂ©montrer le potentiel pour la bioaugmentation de ces bactĂ©ries sur site ont montrĂ© des rĂ©ussites, mais aussi des Ă©checs, sans vraiment que l’on comprenne les raisons ou mĂ©canismes de ces rĂ©sultats. Nous croyons nĂ©cessaire une meilleure comprĂ©hension des processus qui ont lieu pendant l’invasion des bactĂ©ries inoculĂ©es dans des environnements complexes comme le sont les sites contaminĂ©s. De cette façon, on pourrait identifier les « problĂšmes » de la bioaugmentation et pouvoir ainsi exploiter au mieux la diversitĂ© naturelle des bactĂ©ries capables de dĂ©grader des polluants. Dans ce contexte, le principal objectif de ce travail Ă©tait d’étudier de façon rationnelle la capacitĂ© de survie et les stratĂ©gies mĂ©taboliques et physiologiques de la bactĂ©rie Pseudomonas veronii 1YdBTEX2 (qui dĂ©gradent les BTEX). On les a comparĂ© pendant l'adaptation et l'invasion de cette bactĂ©rie dans les sols naturels non stĂ©riles et contaminĂ©s, avec celle des cultures en suspension dans des liquides polluĂ©s ou des matĂ©riaux poreux standardisĂ©s. Dans l’introduction gĂ©nĂ©ral, je prĂ©sente l’état des connaissances en bioremĂ©diation de sols et le concept du bioaugmentation. En particulier je dĂ©cris la dĂ©gradation de composants BTEX (BenzĂšne, ToluĂšne, ÉthylbenzĂšne et XylĂšnes), qui sont des substances volatiles trĂšs toxiques. Dans le premier chapitre de recherche, je me focalise sur la comprĂ©hension des rĂ©ponses adaptatives de P. veronii 1YdBTEX2 pendant la transition de cultures liquides Ă  des sols contaminĂ©s. Dans cette optique, j’ai Ă©tudiĂ© les changements Ă  court terme (1 heure) de l’expression de gĂšnes des bactĂ©ries inoculĂ©es dans le sable non stĂ©rile, versus des cultures liquides ; dans les deux cas, en prĂ©sence ou absence de toluĂšne. Au cours d’un travail collaboratif, nous avons amĂ©liorĂ© la sĂ©quence gĂ©nomique de P. veronii 1YdBTEX2 et nous avons montrĂ© qu’en rĂ©alitĂ©, il est composĂ© de trois rĂ©plicons individuels avec une taille de 8 Mb et qui inclut une large proportion de gĂšnes avec des fonctions inconnues. Une heure d’exposition au toluĂšne, en culture liquide ou au sol, dĂ©clenche une transcription massive (induction jusqu’à 208 fois) de multiples groupes de gĂšnes, comme codant par exemple les voies mĂ©taboliques de dĂ©gradation du toluĂšne, la chimiotaxie ou les pompes d’efflux de toluĂšne. Ces rĂ©sultats montrent clairement le rĂŽle clĂ© d’une rĂ©ponse adaptative au toluĂšne. Comparativement au milieu liquide, les cellules dans le sol changent drastiquement l’expression des gĂšnes impliquĂ©s dans le fonctionnement de la membrane (comme par exemple, la composition et mĂ©tabolisme des graisses et la synthĂšse des acides gras dans la cellule), la rĂ©ponse au stress osmotique (par exemple, la synthĂšse de polyamines ou trĂ©halose, et l’absorption de potassium) et le mĂ©tabolisme de la putrescine, mettant en Ă©vidence les mĂ©canismes adaptatives de P. veronii pour rĂ©ajuster son mĂ©tabolisme au sable. Dans le chapitre suivant je tente de comprendre le changements du transcriptome global de P. veronii pendant la croissance et survie Ă  long terme (phase stationaire) dans des sols contaminĂ©s. En outre, j’ai cherchĂ© Ă  Ă©tudier les diffĂ©rences de comportement dans 3 types de sols non stĂ©riles et un « sol » historiquement contaminĂ© avec des hydrocarbures aromatiques polycycliques. J’ai constatĂ© que P. veronii arrive Ă  s’établir dans tous, sauf un, au dĂ©triment du toluĂšne. La souche croit aussi en absence de toluĂšne, mais avec une densitĂ© de population moindre. Ces rĂ©sultats montrent une capacitĂ© de survie dans des conditions rĂ©elles de terrain, sous le stress de la contamination. J’ai comparĂ© les rĂ©ponses du transcriptome pendant la phase de transition (court terme) ainsi que la croissance exponentielle et stationnaire dans le sol avec le comportement induit dans les cultures liquides ou dans une matrice de silice inerte. L’analyse rĂ©vĂšle un programme global et versatile de P. veronii, caractĂ©risĂ© par des stratĂ©gies communes ou spĂ©cifiques au sol et des Ă©vidences de limitation de nutriments dans les phases de croissance tardives, ce qui explique son comportement dans les environnements du sol. Nous avons aussi observĂ© une expression mĂ©tabolique centrale remarquablement conservĂ©e dans la phase exponentielle, sans distinction de l’environnement de croissance. L’absence de croissance dans un des sols, qui contenait une communautĂ© complexe de microbes, est peu ĂȘtre le rĂ©sultat de prĂ©dation de la part d’une abondante prĂ©sence de protistes rĂ©sidants et/ou une possible compĂ©tition pour le substrat par les microbes dĂ©jĂ  prĂ©sents. En raison de la quantiĂ© insuffisante de P. veronii pour extraire suffisamment d’ARN pour l’analyse du transcriptome, des signes de prĂ©dation ou de stress compĂ©titif n’ont pas pu ĂȘtre Ă©lucidĂ©s. Finalement, dans le quatriĂšme chapitre, j’ai caractĂ©risĂ© les gĂšnes qui pourraient ĂȘtre importants pour la croissance et la survie de P. veronii dans les sols (contaminĂ©s). Pour ce faire, j’ai utilisĂ© la technique des insertions massives de transposons. De cette maniĂšre, on a pu gĂ©nĂ©rer deux librairies indĂ©pendantes d’insertions de transposons dans tout le gĂ©nome de P. veronii Ă  travers l’utilisation d’un transposon mini-Tn5 hyperactive non rĂ©plicatif avec un gĂšne (aph) qui confĂšre de la rĂ©sistance a la kanamycine. Ces librairies on Ă©tĂ© inoculĂ©es dans un milieu de culture liquide et dans deux sols diffĂ©rents en prĂ©sence de toluĂšne et on les a laissĂ©es se propager pendant 50 gĂ©nĂ©rations dans le mĂȘme environnement. Le sĂ©quençage profond des limites amplifiĂ©es entre l’insertion du gĂšne aph et les gĂšnes de la bactĂ©rie ont servi Ă  quantifier la relative abondance des insertions en fonction du temps et des conditions de croissance. L'importance de la valeur adaptative des gĂšnes pour la croissance et la survie au sol a Ă©tĂ© Ă©valuĂ©e Ă  partir de la diminution ou de l'augmentation de l'abondance relative des insertions au cours des gĂ©nĂ©rations, et comparĂ©e Ă  des modĂšles d'insertion alĂ©atoire. En excluant les gĂšnes typiquement considĂ©rĂ©s comme essentiels qui ont Ă©tĂ© majoritairement absents dans les librairies de dĂ©part, nos rĂ©sultats montrent seulement un petit nombre de gĂšnes impliquĂ©s dans la perte de valeur adaptative communs aux deux sols et absents de la culture liquide. Ces rĂ©sultats indiquent des fonctions cruciales du mĂ©tabolisme de l’urĂ©e et des acides gras Ă  chaĂźne courte, de la dĂ©fense contre le stress oxydatif et du transport des nutriments/Ă©lĂ©ments pour la croissance au sol. Notre analyse a aussi montrĂ© que les mutants qui inactivent la biosynthĂšse des flagelles et la motilitĂ© ont un fort gain de valeur adaptative dans le sol. La croissance et survie Ă  long terme de P. veronii dans le sol est plutĂŽt en fonction d’une variĂ©tĂ© des gĂšnes indĂ©pendants et pas nĂ©cessairement d’un programme cohĂ©rent spĂ©cifique au sol. Les rĂ©sultats obtenus dans ma thĂšse permettent de dessiner un tableau global du comportement naturel, des stratĂ©gies et de la capacitĂ© de P. veronii Ă  croitre et Ă  survivre dans des conditions environnementales particuliĂšres telles que celles des sols non stĂ©riles et polluĂ©s. En comparant les rĂ©ponses transcriptomiques dans diffĂšrent sols et matĂ©riaux et le scan de mutants par transposon en diffĂ©rentes phases de croissance, je pense avoir couvert une ample variĂ©tĂ© de conditions et de scenarii qui aident Ă  notre compression des mĂ©canismes nĂ©cessaires Ă  l’adaptation de la souche et Ă  la survie aprĂšs inoculation. L’intĂ©gration de l’ensemble des rĂ©sultats montrent que P.veronii est une souche robuste, sa prompte adaptabilitĂ© est expliquĂ©e par son large et redondant gĂ©nome qui code plusieurs mĂ©canismes pour maintenir son mĂ©tabolisme principal, malgrĂ© l’environnement oĂč il est inoculĂ©, en s’ajustant de maniĂšre efficace aux conditions changeantes du sol. Finalement, ce travail met en Ă©vidence des possibles raisons pour lesquelles P. veronii ne serait pas en mesure d’envahir des microbiomes complexes

    The Marvelous World of tRNAs: From Accurate Mapping to Chemical Modifications

    Get PDF
    Since the discovery of transfer RNAs (tRNAs) as decoders of the genetic code, life science has transformed. Particularly, as soon as the importance of tRNAs in protein synthesis has been established, researchers recognized that the functionality of tRNAs in cellular regulation exceeds beyond this paradigm. A strong impetus for these discoveries came from advances in large-scale RNA sequencing (RNA-seq) and increasingly sophisticated algorithms. Sequencing tRNAs is challenging both experimentally and in terms of the subsequent computational analysis. In RNA-seq data analysis, mapping tRNA reads to a reference genome is an error-prone task. This is in particular true, as chemical modifications introduce systematic reverse transcription errors while at the same time the genomic loci are only approximately identical due to the post-transcriptional maturation of tRNAs. Additionally, their multi-copy nature complicates the precise read assignment to its true genomic origin. In the course of the thesis a computational workflow was established to enable accurate mapping of tRNA reads. The developed method removes most of the mapping artifacts introduced by simpler mapping schemes, as demonstrated by using both simulated and human RNA-seq data. Subsequently, the resulting mapping profiles can be used for reliable identification of specific chemical tRNA modifications with a false discovery rate of only 2%. For that purpose, computational analysis methods were developed that facilitates the sensitive detection and even classification of most tRNA modifications based on their mapping profiles. This comprised both untreated RNA-seq data of various species, as well as treated data of Bacillus subtilis that has been designed to display modifications in a specific read-out in the mapping profile. The discussion focuses on sources of artifacts that complicate the profiling of tRNA modifications and strategies to overcome them. Exemplary studies on the modification pattern of different human tissues and the developmental stages of Dictyostelium discoideum were carried out. These suggested regulatory functions of tRNA modifications in development and during cell differentiation. The main experimental difficulties of tRNA sequencing are caused by extensive, stable secondary structures and the presence of chemical modifications. Current RNA-seq methods do not sample the entire tRNA pool, lose short tRNA fragments, or they lack specificity for tRNAs. Within this thesis, the benchmark and improvement of LOTTE-seq, a method for specific selection of tRNAs for high-throughput sequencing, exhibited that the method solves the experimental challenges and avoids the disadvantages of previous tRNA-seq protocols. Applying the accurate tRNA mapping strategy to LOTTE-seq and other tRNA-specific RNA- seq methods demonstrated that the content of mature tRNAs is highest in LOTTE-seq data, ranging from 90% in Spinacia oleracea to 100% in D. discoideum. Additionally, the thesis addressed the fact that tRNAs are multi-copy genes that undergo concerted evolution which keeps sequences of paralogous genes effectively identical. Therefore, it is impossible to distinguish orthologs from paralogs by sequence similarity alone. Synteny, the maintenance of relative genomic positions, is helpful to disambiguate evolutionary relationships in this situation. During this thesis a workflow was computed for synteny-based orthology identification of tRNA genes. The workflow is based on the use of pre-computed genome-wide multiple sequence alignment blocks as anchors to establish syntenic conservation of sequence intervals. Syntenic clusters of concertedly evolving genes of different tRNA families are then subdivided and processed by cograph editing to recover their duplication histories. A useful outcome of this study is that it highlights the technical problems and difficulties associated with an accurate analysis of the evolution of multi-copy genes. To showcase the method, evolution of tRNAs in primates and fruit flies were reconstructed. In the last decade, a number of reports have described novel aspects of tRNAs in terms of the diversity of their genes. For example, nuclear-encoded mitochondrial-derived tRNAs (nm-tRNAs) have been reported whose presence provokes intriguing questions about their functionality. Within this thesis an annotation strategy was developed that led to the identification of 335 and 43 novel nm-tRNAs in human and mouse, respectively. Interestingly, downstream analyses showed that the localization of several nm-tRNAs in introns and the over-representation of conserved RNA-binding sites of proteins involved in splicing suggest a potential regulatory function of intronic nm-tRNAs in splicing
    • 

    corecore