64 research outputs found

    “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

    Get PDF
    International audienceBackground: Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results: We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions: Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes

    Inference of sparse combinatorial-control networks from gene-expression data: a message passing approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcriptional gene regulation is one of the most important mechanisms in controlling many essential cellular processes, including cell development, cell-cycle control, and the cellular response to variations in environmental conditions. Genes are regulated by transcription factors and other genes/proteins via a complex interconnection network. Such regulatory links may be predicted using microarray expression data, but most regulation models suppose transcription factor independence, which leads to spurious links when many genes have highly correlated expression levels.</p> <p>Results</p> <p>We propose a new algorithm to infer combinatorial control networks from gene-expression data. Based on a simple model of combinatorial gene regulation, it includes a message-passing approach which avoids explicit sampling over putative gene-regulatory networks. This algorithm is shown to recover the structure of a simple artificial cell-cycle network model for baker's yeast. It is then applied to a large-scale yeast gene expression dataset in order to identify combinatorial regulations, and to a data set of direct medical interest, namely the Pleiotropic Drug Resistance (PDR) network.</p> <p>Conclusions</p> <p>The algorithm we designed is able to recover biologically meaningful interactions, as shown by recent experimental results <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Moreover, new cases of combinatorial control are predicted, showing how simple models taking this phenomenon into account can lead to informative predictions and allow to extract more putative regulatory interactions from microarray databases.</p

    Genome Expression Dynamics Reveal the Parasitism Regulatory Landscape of the Root-Knot Nematode Meloidogyne incognita and a Promoter Motif Associated with Effector Genes.

    Get PDF
    Root-knot nematodes (genus Meloidogyne) are the major contributor to crop losses caused by nematodes. These nematodes secrete effector proteins into the plant, derived from two sets of pharyngeal gland cells, to manipulate host physiology and immunity. Successful completion of the life cycle, involving successive molts from egg to adult, covers morphologically and functionally distinct stages and will require precise control of gene expression, including effector genes. The details of how root-knot nematodes regulate transcription remain sparse. Here, we report a life stage-specific transcriptome of Meloidogyne incognita. Combined with an available annotated genome, we explore the spatio-temporal regulation of gene expression. We reveal gene expression clusters and predicted functions that accompany the major developmental transitions. Focusing on effectors, we identify a putative cis-regulatory motif associated with expression in the dorsal glands, providing an insight into effector regulation. We combine the presence of this motif with several other criteria to predict a novel set of putative dorsal gland effectors. Finally, we show this motif, and thereby its utility, is broadly conserved across the Meloidogyne genus, and we name it Mel-DOG. Taken together, we provide the first genome-wide analysis of spatio-temporal gene expression in a root-knot nematode and identify a new set of candidate effector genes that will guide future functional analyses

    Biais de codons et régulation de la traduction chez les bactéries et leurs phages

    No full text
    This thesis contains some works about the codon bias and its role in bacteria and phages, particularly about regulation of translation and chromosome organization in bacteria. After an introduction describing i) translation processes in prokaryotes, and ii) bases of classification and information theories, a new clustering algorithm designed to classify a set of genes according to their codon usage is presented. Its application to the genomes of E.coli and B. subtilis puts forward multiple phenomena. Their genomes are respectively composed of 4 and 5 groups of genes sharing the same codon usage. The genes of the same group tend to have similar function, and are organized in coherent domains 10 to 15 genes long on the chromosome. This non-trivial organisation could be used to regulate the translation speed of genes depending on their similarity with their genetic context. In the second part, the codon bias and tRNA content of phages are analyzed, relative to those of their hosts. Statistical tests show that tRNA content in phage genomes is not random, but biased towards the tRNA cognate to the frequent codons in the phage genome. A master equation model shows that this tRNA distribution could be the result of two processes: random acquisition of tRNA among those of the host, and preferential loss of tRNA cognate to codons used less in the phage genome than inside its host. Such a selection could be adaptative by allowing the phage to keep only the tRNAs insufficiently represented inside its host. Eventually, more tRNAs are observed among lytic phages than among temperate ones, which lead to the hypothesis that the selective pressure acting on translation is more important to them.Cette thĂšse regroupe des travaux concernant le biais d'usage de codons et son rĂŽle chez les bactĂ©ries et leurs phages, en particulier sur les processus de traduction et l'organisation des gĂ©nomes bactĂ©riens. AprĂšs une introduction portant sur i) la traduction chez les procaryotes, et ii) les techniques de classification et leurs liens avec la thĂ©orie de l'information, un nouvel algorithme de partition d'un ensemble de gĂšnes en fonction de leur usage de codons est prĂ©sentĂ©. Son application aux gĂ©nomes d'E. coli et de B. subtilis permet de mettre en Ă©vidence plusieurs phĂ©nomĂšnes. Le gĂ©nome de ces organismes se dĂ©compose respectivement en 4 et 5 groupes de gĂšnes ayant des usages de codons distincts. Les gĂšnes du mĂȘme groupe tendent Ă  partager des fonctions similaires, et sont organisĂ©s sur le chromosome en domaines cohĂ©rents d'une longueur de 10 Ă  15 gĂšnes. Cette organisation non triviale pourrait permettre une rĂ©gulation de la vitesse de traduction des gĂšnes en fonction de leur similaritĂ© avec leur environnement gĂ©nĂ©tique. Dans la seconde partie le biais de codons et le contenu en ARN de transfert (ARNt) de bactĂ©riophages sont analysĂ©s, comparativement Ă  ceux de leurs hĂŽtes. L'Ă©tude statistique montre que le contenu en ARNt des phages n'est pas alĂ©atoire, mais biaisĂ© en faveur d'ARNt complĂ©mentaires aux codons frĂ©quents dans le gĂ©nome du phage. Un modĂšle d'Ă©quation maĂźtresse montre que cette distribution des ARNt au sein des gĂ©nomes de phages pourrait ĂȘtre le rĂ©sultat de deux processus : l'acquisition alĂ©atoire par le phage d'ARNt, parmi ceux de l'hĂŽte, et la perte prĂ©fĂ©rentielle des ARNt correspondants Ă  des codons moins utilisĂ©s par le phage que par son hĂŽte. Un tel mĂ©canisme permettrait au phage de s'adapter en ne conservant au final que les ARNt prĂ©sents en quantitĂ© insuffisante chez son hĂŽte pendant l'infection. Finalement, on observe plus d'ARNt chez les phages lytiques que chez les tempĂ©rĂ©s, laissant supposer que les processus de traduction sont soumis Ă  une plus forte pression de sĂ©lection chez eux

    Codon Usage Domains over Bacterial Chromosomes

    Get PDF
    The geography of codon bias distributions over prokaryotic genomes and its impact upon chromosomal organization are analyzed. To this aim, we introduce a clustering method based on information theory, specifically designed to cluster genes according to their codon usage and apply it to the coding sequences of Escherichia coli and Bacillus subtilis. One of the clusters identified in each of the organisms is found to be related to expression levels, as expected, but other groups feature an over-representation of genes belonging to different functional groups, namely horizontally transferred genes, motility, and intermediary metabolism. Furthermore, we show that genes with a similar bias tend to be close to each other on the chromosome and organized in coherent domains, more extended than operons, demonstrating a role of translation in structuring bacterial chromosomes. It is argued that a sizeable contribution to this effect comes from the dynamical compartimentalization induced by the recycling of tRNAs, leading to gene expression rates dependent on their genomic and expression context

    Causes for the intriguing presence of tRNAs in phages

    No full text
    Phages have highly compact genomes with sizes reflecting their capacity to exploit the host resources. Here, we investigate the reasons for tRNAs being the only translation-associated genes frequently found in phages. We were able to unravel the selective processes shaping the tRNA distribution in phages by analyzing their genomes and those of their hosts. We found ample evidence against tRNAs being selected to facilitate phage integration in the prokaryotic chromosomes. Conversely, there is a significant association between tRNA distribution and codon usage. We support this observation by introducing a master equation model, where tRNAs are randomly gained from their hosts and then lost either neutrally or according to a set of different selection mechanisms. Those tRNAs present in phages tend to correspond to codons that are simultaneously highly used by the phage genes, while rare in the host genome. Accordingly, we propose that a selective recruitment of tRNAs compensates for the compositional differences between the phage and the host genomes. To further understand the importance of these results in phage biology, we analyzed the differences between temperate and virulent phages. Virulent phages contain more tRNAs than temperate ones, higher codon usage biases, and more important compositional differences with respect to the host genome. These differences are thus in perfect agreement with the results of our master equation model and further suggest that tRNA acquisition may contribute to higher virulence. Thus, even though phages use most of the cell’s translation machinery, they can complement it with their own genetic information to attain higher fitness. These results suggest that similar selection pressures may act upon other cellular essential genes that are being found in the recently uncovered large viruses
    • 

    corecore