28 research outputs found

    Browsing repeats in genomes: Pygram and an application to non-coding region analysis

    Get PDF
    BACKGROUND: A large number of studies on genome sequences have revealed the major role played by repeated sequences in the structure, function, dynamics and evolution of genomes. In-depth repeat analysis requires specialized methods, including visualization techniques, to achieve optimum exploratory power. RESULTS: This article presents Pygram, a new visualization application for investigating the organization of repeated sequences in complete genome sequences. The application projects data from a repeat index file on the analysed sequences, and by combining this principle with a query system, is capable of locating repeated sequences with specific properties. In short, Pygram provides an efficient, graphical browser for studying repeats. Implementation of the complete configuration is illustrated in an analysis of CRISPR structures in Archaea genomes and the detection of horizontal transfer between Archaea and Viruses. CONCLUSION: By proposing a new visualization environment to analyse repeated sequences, this application aims to increase the efficiency of laboratories involved in investigating repeat organization in single genomes or across several genomes

    Modeling local repeats on genomic sequences

    Get PDF
    This paper deals with the specification and search of repeats of biological interest, i.e. repeats that may have a role in genomic structures or functions. Although some particular repeats such as tandem repeats have been well formalized, models developed so far remain of limited expressivity with respect to known forms of repeats in biological sequences. This paper introduces new general and realistic concepts characterizing potentially useful repeats in a sequence: Locality and several refinements around the Maximality concept. Locality is related to the distribution of occurrences of repeated elements and characterizes the way occurrences are clustered in this distribution. The associated notion of neighborhood allows to indirectly exhibit words with a distribution of occurrences that is correlated to a given distribution. Maximality is related to the contextual delimitation of the repeated units. We have extended the usual notion of maximality, working on the inclusion relation between repeats and taking into account larger contexts. Mainly, we introduced a new repeat concept, largest maximal repeats, looking for the existence of a subset of maximal occurrences of a repeated word instead of a global maximization. We propose algorithms checking for local and refined maximal repeats using at the conceptual level a suffix tree data structure. Experiments on natural and artificial data further illustrate various aspects of this new setting. All programs are available on the genouest platform, at http://genouest.org/modulome

    The dog and rat olfactory receptor repertoires

    Get PDF
    BACKGROUND: Dogs and rats have a highly developed capability to detect and identify odorant molecules, even at minute concentrations. Previous analyses have shown that the olfactory receptors (ORs) that specifically bind odorant molecules are encoded by the largest gene family sequenced in mammals so far. RESULTS: We identified five amino acid patterns characteristic of ORs in the recently sequenced boxer dog and brown Norway rat genomes. Using these patterns, we retrieved 1,094 dog genes and 1,493 rat genes from these shotgun sequences. The retrieved sequences constitute the olfactory receptor repertoires of these two animals. Subsets of 20.3% (for the dog) and 19.5% (for the rat) of these genes were annotated as pseudogenes as they had one or several mutations interrupting their open reading frames. We performed phylogenetic studies and organized these two repertoires into classes, families and subfamilies. CONCLUSION: We have established a complete or almost complete list of OR genes in the dog and the rat and have compared the sequences of these genes within and between the two species. Our results provide insight into the evolutionary development of these genes and the local amplifications that have led to the specific amplification of many subfamilies. We have also compared the human and rat ORs with the human and mouse OR repertoires

    Pyramid diagram: visualizing the organization of repetitive sequences in genomes.

    No full text
    Rapport de recherche n°5798 Projet SymbioseWe introduce a new visualization method, the pyramid diagram, or pygram, capable of summarizing the hierarchical organization of repetitive structures in genome sequences. Pygrams improve over previous methods because they clearly display complex patterns of repeats located either within a single sequence or between several sequences, and are founded on the well defined notion of maximal repeat. This report describes the design and implementation of Pyramid, a tool producing pygrams, and details some applications of the visualization method in Virus and Archaea genomics

    Pyramid diagram: visualizing the organization of repetitive sequences in genomes.

    No full text
    Rapport de recherche n°5798 Projet SymbioseWe introduce a new visualization method, the pyramid diagram, or pygram, capable of summarizing the hierarchical organization of repetitive structures in genome sequences. Pygrams improve over previous methods because they clearly display complex patterns of repeats located either within a single sequence or between several sequences, and are founded on the well defined notion of maximal repeat. This report describes the design and implementation of Pyramid, a tool producing pygrams, and details some applications of the visualization method in Virus and Archaea genomics

    Découverte et analyse de signatures à grande échelle dans les protéines amyloïdes.

    Get PDF
    National audienceLe terme "amyloïde" décrit des dépôts intra ou extracellulaires, principalement composés de protéines assemblées en fibres. Ces fibres amyloïdes présentent des caractéristiques particulières : structure dite "cross-beta", biréfringence verte après coloration au rouge Congo et résistance aux protéases. Pourtant, les protéines mises en évidence au sein de ces fibres appartiennent à une trentaine de familles sans ressemblance structurale ou fonctionnelle évidente, à l'exception de cette capacité à former des agrégats fibrillaires insolubles. Les fibres amyloïdes sont caractéristiques, notamment, de plusieurs pathologies neurodégénratives majeures, telle que la maladie d'Alzheimer. Toutefois, les mécanismes moléculaires conduisant à la formation des fibres amyloïdes sont encore largement inconnus. La comparaison des différentes familles de protéines amyloïdes devrait permettre de mettre en évidence des déterminants physico-chimiques impliqués dans ces mécanismes d'agrégation. Le travail présenté dans cet article est divisé en trois points principaux : • La première étape a été la construction d'une base de connaissance, appelée AMYPdb, dédiée au stockage d'informations sur les familles de protéines amyloïdes et leurs signatures de séquences. AMYPdb est la première base de données consacrée à l'identification bioinformatique de signatures de séquences pouvant jouer un rôle dans l'agrégation protéique et l'assemblage en fibres. • La seconde partie a été la découverte à grande échelle de signatures pour chacune des familles de protéines amyloïdes. Cela a généré 3332 motifs, qui ont ensuite été recherchés dans les 2 millions de séquences d'UniProtKB. 14 millions d'occurences de ces motifs ont ainsi été découvertes. • Dans un troisième temps, nous avons analysé qualitativement chaque signature en utilisant trois critères : la sensibilité, la spécificité et la corrélation. Nous avons ainsi mis en lumière des signatures de meilleure qualité que les motifs déjà connus des protéines amyloïdes. Ces signatures ont ensuite été utilisées pour identifier de nouvelles protéines appartenant aux familles amyloïdes

    Domain organization within repeated DNA sequences: application to the study of a family of transposable elements.

    No full text
    DomainOrganizer web page is available at www.irisa.fr/symbiose/DomainOrganizer/Motivation: The analysis of repeated elements in genomes is a fascinating domain of research that is lacking relevant tools for transposable elements (TEs), the most complex ones. The dynamics of TEs, which provides the main mechanism of mutation in some genomes, is an essential component of genome evolution. In this study we introduce a new concept of domain, a segmentation unit useful for describing the architecture of different copies of TEs. Our method extracts occurrences of a terminus-defined family of TEs, aligns the sequences, finds the domains in the alignment and searches the distribution of eachdomainin sequences. Afteraclassification step relative to the presence or the absence of domains, the method results in a graphical view of sequences segmented into domains. Results: Analysis of the new non-autonomous TE AtREP21 in the model plant Arabidopsis thaliana reveals copies of very different sizes and various combinations of domains which show the potential of our method

    Domain organization within repeated DNA sequences: application to the study of a family of transposable elements.

    No full text
    DomainOrganizer web page is available at www.irisa.fr/symbiose/DomainOrganizer/Motivation: The analysis of repeated elements in genomes is a fascinating domain of research that is lacking relevant tools for transposable elements (TEs), the most complex ones. The dynamics of TEs, which provides the main mechanism of mutation in some genomes, is an essential component of genome evolution. In this study we introduce a new concept of domain, a segmentation unit useful for describing the architecture of different copies of TEs. Our method extracts occurrences of a terminus-defined family of TEs, aligns the sequences, finds the domains in the alignment and searches the distribution of eachdomainin sequences. Afteraclassification step relative to the presence or the absence of domains, the method results in a graphical view of sequences segmented into domains. Results: Analysis of the new non-autonomous TE AtREP21 in the model plant Arabidopsis thaliana reveals copies of very different sizes and various combinations of domains which show the potential of our method
    corecore