42 research outputs found

    Modeling local repeats on genomic sequences

    Get PDF
    This paper deals with the specification and search of repeats of biological interest, i.e. repeats that may have a role in genomic structures or functions. Although some particular repeats such as tandem repeats have been well formalized, models developed so far remain of limited expressivity with respect to known forms of repeats in biological sequences. This paper introduces new general and realistic concepts characterizing potentially useful repeats in a sequence: Locality and several refinements around the Maximality concept. Locality is related to the distribution of occurrences of repeated elements and characterizes the way occurrences are clustered in this distribution. The associated notion of neighborhood allows to indirectly exhibit words with a distribution of occurrences that is correlated to a given distribution. Maximality is related to the contextual delimitation of the repeated units. We have extended the usual notion of maximality, working on the inclusion relation between repeats and taking into account larger contexts. Mainly, we introduced a new repeat concept, largest maximal repeats, looking for the existence of a subset of maximal occurrences of a repeated word instead of a global maximization. We propose algorithms checking for local and refined maximal repeats using at the conceptual level a suffix tree data structure. Experiments on natural and artificial data further illustrate various aspects of this new setting. All programs are available on the genouest platform, at http://genouest.org/modulome

    Genomic investigations of unexplained acute hepatitis in children

    Get PDF
    Since its first identification in Scotland, over 1,000 cases of unexplained paediatric hepatitis in children have been reported worldwide, including 278 cases in the UK1. Here we report an investigation of 38 cases, 66 age-matched immunocompetent controls and 21 immunocompromised comparator participants, using a combination of genomic, transcriptomic, proteomic and immunohistochemical methods. We detected high levels of adeno-associated virus 2 (AAV2) DNA in the liver, blood, plasma or stool from 27 of 28 cases. We found low levels of adenovirus (HAdV) and human herpesvirus 6B (HHV-6B) in 23 of 31 and 16 of 23, respectively, of the cases tested. By contrast, AAV2 was infrequently detected and at low titre in the blood or the liver from control children with HAdV, even when profoundly immunosuppressed. AAV2, HAdV and HHV-6 phylogeny excluded the emergence of novel strains in cases. Histological analyses of explanted livers showed enrichment for T cells and B lineage cells. Proteomic comparison of liver tissue from cases and healthy controls identified increased expression of HLA class 2, immunoglobulin variable regions and complement proteins. HAdV and AAV2 proteins were not detected in the livers. Instead, we identified AAV2 DNA complexes reflecting both HAdV-mediated and HHV-6B-mediated replication. We hypothesize that high levels of abnormal AAV2 replication products aided by HAdV and, in severe cases, HHV-6B may have triggered immune-mediated hepatic disease in genetically and immunologically predisposed children

    Dynamique des hélitrons dans le genome d'Arabidopsis thaliana : développement de nouvelles stratégies d'analyse des éléments transposables

    No full text
    The transposable element Helitron is a recently discovered type of transposon present in the genome of Arabidopsis thaliana. The thesis studies three relations between helitrons and their host genome: their mode of genome's invasion, the modularity of their internal sequence interne and their impact on close genes.Helitrons are the most widely spreaded transposable elements in this genome. However they are only partially recognized by current alignment softwares. We have produced a formal grammatical model of these elements, made of two extremities separated by a highly variable sequence of bounded size. We have built the matrix of all occurrences of models crossing all possible extremities. Combinations showed preferential associations between extremities and new chimeric families of helitrons. The detection of ORFs including helicase and RPA transposition proteins has confirmed the existence of a relation between the existence of autonomous and non-autonomous helitrons and offered some cues to understand mechanisms allowing the creation of chimeric helitrons from truncated helitronic insertions. We propose finally a new nomenclature of helitrons in Arabidopsis based on their extremities instead of their global sequence.The observation of an helitronic sequence shows a deep reorganization of nucleic domains betweeen different copies within this family. In order to understand this organization, we have design a software, DomainOrganizer, which allows to establish the domains' composition of transposable elements . DomainOrganizer detects first domains' borders from a multiple alignment and provides the list of domains. From this list it looks for a minimal number of domains that maximizes the covering of the set of sequences, using a combinatorial optimization approach. Finally DomainOrganizer clusters and visualizes the set of sequences with respect to their domains. The analysis of domains in the familly AtREP21, a family with a high variability precluding any direct phylogenetic analysis, has allowed to decipher the nature of this variability and to trace back a possible evolutionnary scenario of this family from the identification of its domains.We have also studied the localization of helitrons in the genomic sequence of Arabidopsis thaliana and shown a preferential insertion of them in genes' promoters. Our work focused on family AtREP3 and on helitrons present at less than 1 kb from a START codon. The expression profiles of these genes reveals the existence of several clusters of similar profiles at the level of tissues. The patterns of transcription factor sites are highly variables in promoters except for helitron-containing promoters. This is in accordance with the hypothesis that non-autonomous helitrons carry patterns coding for transcription factor sites. Additional experiments have to be designed in order to fully understand the regulation of genes close to helitrons.Les hĂ©litrons constituent un groupe d'Ă©lĂ©ments transposables dĂ©couverts rĂ©cemment dans les gĂ©nome eucaryotes. A travers une Ă©tude bioinformatique, nous avons Ă©tudiĂ© leur mode d'invasion, la modularitĂ© de leur sĂ©quence et leurs impacts sur les gĂšnes Ă  leur proximitĂ© dans le gĂ©nome d'Arabidopsis thaliana. Les hĂ©litrons sont les Ă©lĂ©ments transposables les plus rĂ©pandus dans ce gĂ©nome ; nĂ©anmoins ils ne sont que partiellement reconnus par des logiciels d'alignement. Nous avons modĂ©lisĂ© ces Ă©lĂ©ments sous la forme d'une grammaire formelle. Cette grammaire est constituĂ©e des deux extrĂ©mitĂ©s terminales sĂ©parĂ©es par une sĂ©quence nuclĂ©otidique quelconque de taille fixĂ©e. Nous avons crĂ©Ă© une matrice d'occurrences des modĂšles associant toutes les combinaisons possibles d'extrĂ©mitĂ©s. La matrice a fait apparaĂźtre des associations prĂ©fĂ©rentielles entre certaines extrĂ©mitĂ©s et a permis la dĂ©couverte de nouvelles familles d'hĂ©litrons chimĂ©riques. La dĂ©tection des ORFs contenant les protĂ©ines de transposition a permis de confirmer la relation hĂ©litron autonome non-autonome et de comprendre le mĂ©canisme de crĂ©ation des chimĂšres d'hĂ©litrons. Nous avons proposĂ© une nouvelle nomenclature des hĂ©litrons basĂ©e sur leurs extrĂ©mitĂ©s et non sur leur sĂ©quence globale. L'Ă©tude de la sĂ©quence d'une famille d'hĂ©litrons a montrĂ© une rĂ©organisation constante des domaines nuclĂ©iques entre les diffĂ©rentes copies de cette famille. Pour comprendre cette organisation, nous avons mis au point le logiciel DomainOrganizer qui permet d'observer la composition en domaines des Ă©lĂ©ments transposables. DomainOrganizer dĂ©tecte les frontiĂšres entre domaines Ă  partir d'un alignement multiple et crĂ©e la liste des domaines. A partir de cette liste, il recherche, par un algorithme d'optimisation combinatoire, le nombre minimal de domaines qui recouvrent au maximum l'ensemble des sĂ©quences. Enfin, DomainOrganizer visualise et classe les sĂ©quences en fonction de leurs domaines. L'analyse par domaines de la famille AtREP21 a permis de comprendre la nature de cette variabilitĂ© et de retracer l'histoire Ă©volutive de cette famille Ă  partir de l'identification des domaines. L'Ă©tude de la localisation des hĂ©litrons AtREP3 dans ce gĂ©nome de plante a montrĂ© une insertion prĂ©fĂ©rentielle de ceux-ci dans les promoteurs de gĂšnes. Les profils d'expression de ces gĂšnes, nous a permis d'identifier plusieurs clusters. Par ailleurs, les motifs de rĂ©gulation ont montrĂ© une grande variabilitĂ© de motifs dans les promoteurs mais pas dans les hĂ©litrons. Ces rĂ©sultats ont montrĂ© que les hĂ©litrons non-autonomes transportent dans leurs sĂ©quences internes des motifs de liaisons aux facteurs de transcription. Des analyses complĂ©mentaires devront ĂȘtre rĂ©alisĂ©es pour comprendre l'action rĂ©gulatrice des hĂ©litrons sur les gĂšnes situĂ©s Ă  leur proximitĂ©

    Dynamique des hélitrons dans le genome d'Arabidopsis thaliana : développement de nouvelles stratégies d'analyse des éléments transposables

    No full text
    The transposable element Helitron is a recently discovered type of transposon present in the genome of Arabidopsis thaliana. The thesis studies three relations between helitrons and their host genome: their mode of genome's invasion, the modularity of their internal sequence interne and their impact on close genes.Helitrons are the most widely spreaded transposable elements in this genome. However they are only partially recognized by current alignment softwares. We have produced a formal grammatical model of these elements, made of two extremities separated by a highly variable sequence of bounded size. We have built the matrix of all occurrences of models crossing all possible extremities. Combinations showed preferential associations between extremities and new chimeric families of helitrons. The detection of ORFs including helicase and RPA transposition proteins has confirmed the existence of a relation between the existence of autonomous and non-autonomous helitrons and offered some cues to understand mechanisms allowing the creation of chimeric helitrons from truncated helitronic insertions. We propose finally a new nomenclature of helitrons in Arabidopsis based on their extremities instead of their global sequence.The observation of an helitronic sequence shows a deep reorganization of nucleic domains betweeen different copies within this family. In order to understand this organization, we have design a software, DomainOrganizer, which allows to establish the domains' composition of transposable elements . DomainOrganizer detects first domains' borders from a multiple alignment and provides the list of domains. From this list it looks for a minimal number of domains that maximizes the covering of the set of sequences, using a combinatorial optimization approach. Finally DomainOrganizer clusters and visualizes the set of sequences with respect to their domains. The analysis of domains in the familly AtREP21, a family with a high variability precluding any direct phylogenetic analysis, has allowed to decipher the nature of this variability and to trace back a possible evolutionnary scenario of this family from the identification of its domains.We have also studied the localization of helitrons in the genomic sequence of Arabidopsis thaliana and shown a preferential insertion of them in genes' promoters. Our work focused on family AtREP3 and on helitrons present at less than 1 kb from a START codon. The expression profiles of these genes reveals the existence of several clusters of similar profiles at the level of tissues. The patterns of transcription factor sites are highly variables in promoters except for helitron-containing promoters. This is in accordance with the hypothesis that non-autonomous helitrons carry patterns coding for transcription factor sites. Additional experiments have to be designed in order to fully understand the regulation of genes close to helitrons.Les hĂ©litrons constituent un groupe d'Ă©lĂ©ments transposables dĂ©couverts rĂ©cemment dans les gĂ©nome eucaryotes. A travers une Ă©tude bioinformatique, nous avons Ă©tudiĂ© leur mode d'invasion, la modularitĂ© de leur sĂ©quence et leurs impacts sur les gĂšnes Ă  leur proximitĂ© dans le gĂ©nome d'Arabidopsis thaliana. Les hĂ©litrons sont les Ă©lĂ©ments transposables les plus rĂ©pandus dans ce gĂ©nome ; nĂ©anmoins ils ne sont que partiellement reconnus par des logiciels d'alignement. Nous avons modĂ©lisĂ© ces Ă©lĂ©ments sous la forme d'une grammaire formelle. Cette grammaire est constituĂ©e des deux extrĂ©mitĂ©s terminales sĂ©parĂ©es par une sĂ©quence nuclĂ©otidique quelconque de taille fixĂ©e. Nous avons crĂ©Ă© une matrice d'occurrences des modĂšles associant toutes les combinaisons possibles d'extrĂ©mitĂ©s. La matrice a fait apparaĂźtre des associations prĂ©fĂ©rentielles entre certaines extrĂ©mitĂ©s et a permis la dĂ©couverte de nouvelles familles d'hĂ©litrons chimĂ©riques. La dĂ©tection des ORFs contenant les protĂ©ines de transposition a permis de confirmer la relation hĂ©litron autonome non-autonome et de comprendre le mĂ©canisme de crĂ©ation des chimĂšres d'hĂ©litrons. Nous avons proposĂ© une nouvelle nomenclature des hĂ©litrons basĂ©e sur leurs extrĂ©mitĂ©s et non sur leur sĂ©quence globale. L'Ă©tude de la sĂ©quence d'une famille d'hĂ©litrons a montrĂ© une rĂ©organisation constante des domaines nuclĂ©iques entre les diffĂ©rentes copies de cette famille. Pour comprendre cette organisation, nous avons mis au point le logiciel DomainOrganizer qui permet d'observer la composition en domaines des Ă©lĂ©ments transposables. DomainOrganizer dĂ©tecte les frontiĂšres entre domaines Ă  partir d'un alignement multiple et crĂ©e la liste des domaines. A partir de cette liste, il recherche, par un algorithme d'optimisation combinatoire, le nombre minimal de domaines qui recouvrent au maximum l'ensemble des sĂ©quences. Enfin, DomainOrganizer visualise et classe les sĂ©quences en fonction de leurs domaines. L'analyse par domaines de la famille AtREP21 a permis de comprendre la nature de cette variabilitĂ© et de retracer l'histoire Ă©volutive de cette famille Ă  partir de l'identification des domaines. L'Ă©tude de la localisation des hĂ©litrons AtREP3 dans ce gĂ©nome de plante a montrĂ© une insertion prĂ©fĂ©rentielle de ceux-ci dans les promoteurs de gĂšnes. Les profils d'expression de ces gĂšnes, nous a permis d'identifier plusieurs clusters. Par ailleurs, les motifs de rĂ©gulation ont montrĂ© une grande variabilitĂ© de motifs dans les promoteurs mais pas dans les hĂ©litrons. Ces rĂ©sultats ont montrĂ© que les hĂ©litrons non-autonomes transportent dans leurs sĂ©quences internes des motifs de liaisons aux facteurs de transcription. Des analyses complĂ©mentaires devront ĂȘtre rĂ©alisĂ©es pour comprendre l'action rĂ©gulatrice des hĂ©litrons sur les gĂšnes situĂ©s Ă  leur proximitĂ©

    A Fast Ab Initio Method for Predicting miRNA Precursors in Genomes

    Get PDF
    International audienceMicroRNAs (miRNAs) are non-coding RNAs with only 21-25 nt in sequence length that are present in all sequenced higher eukaryotes ([1]). miRNA genes are cleaved into a 40-940 nt long precursor of miRNA sequences (pre-miRNAs). Pre-miRNAs, structured as hairpins, are transported into the cytoplasm and are cleaved into mature miRNA ([1]). They are involved as negative regulators of gene expression by binding to speciïŹc mRNA targets ([1]). Bioinformatics methods that predict pre-miRNAs can be divided into three approaches: comparative genomics, homology-based approaches and ab initio approaches. Comparative genomics and homology-based approaches cannot detect miRNAs of unknown families and/or miRNAs with no close homologous in genomes. Furthermore, comparative approaches do not work on new genomes that do not have a closely related sequenced species. Ab-initio methods are needed to predict new miRNAs in genomes. In our knowledge, there are very few ab initio algorithms that search for pre-miRNA structures in whole genomes and all are speciïŹc to one or some genomes. We present a new ab initio method, called miRNAFold, for predicting pre-miRNA structures in any genome. Our method consider a sliding window of a given size L sufïŹciently long to contain a pre-miRNA. In a ïŹrst step, we search for long exact Watson-Crick stems which verify some criteria. In a second step, we extend the selected stem in order to get the longest symmetrical non-exact Watson-Crick stem verifying some criteria. This longest symmetrical non-exact stem can correspond to a large portion of a pre-miRNA. Possible pre-miRNA hairpins are then searched for in the subsequence associated to the selected symmetrical non-exact stem. At each step, several selection criteria are used, corresponding to several features observed on the exact stems, the symmetrical non-exact stems and the hairpins. Some of these criteria, for example G; ratio A, U, C and G, are also used in ([2,4]). Because a miRNA hairpin can present some of these features but not all, an exact stem, a symmetrical non-exact stem or an hairpin is selected when a certain percentage of the criteria are veriïŹed. This percentage is a parameter which could be set by the user. We compared our algorithm miRNAFold with RNALFold ([3]) which searches in genomic sequences for all possible non-coding RNA secondary structures including hairpins. We thus compared the hairpins predicted by RNALFold with the ones predicted by our algorithm miRNAFold. We used RNALFold software in version 1.8.4. downloaded from the Vienna RNA Package (www.tbi.univie.ac.at/RNA/) and it was run with its default parameters. We used a sliding window of 150 nt for each of thr two software. We tested miRNAFold and RNALFold on the human, mouse, zebraïŹsh and sea squirt genomic sequences. Each sequence contains a cluster of several known miRNAs. miRNAFold was run with a threshold of 70% for the minimum percentage of veriïŹed criteria. miRNAFold has better sensitivity and selectivity results than RNALFold on the human, mouse, zebraïŹsh and sea quirt genomic sequences. Moreover miRNAFold is the fastest algorithm. Our average time execution is 57 seconds for a sequence of 1 million of nucleotides, when RNALFold has an average time execution of 5 minutes and 46 seconds. miRNAFold is then almost 6 times faster than RNALFold

    An Automatic Method for Identifying TE-derived Pre-miRNAs

    No full text
    National audienc
    corecore