59 research outputs found

    A review of the current methods for computational analysis of tandem repeats

    Get PDF
    This paper considers some of the most important methods for computational tandem repeat analysis. The problem of repeats analysis is far from trivial due to the fact that tandems tend to be highly polymorphic motifs, i.e. or types of mutations within repeats has to be considered. The computational analysis of all types of mutations within repeats increases the time of execution, especially if chromosomes or whole genomes are subject of an analysis. On the other the time complexity significantly improves if only exact tandem repeats are considered, but this has less practical application. There are pros and cons of the methods being considered and maybe the most suitable solutions is a compromise of the opposed conceptions

    Analysis Of DNA Motifs In The Human Genome

    Full text link
    DNA motifs include repeat elements, promoter elements and gene regulator elements, and play a critical role in the human genome. This thesis describes a genome-wide computational study on two groups of motifs: tandem repeats and core promoter elements. Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover tandem repeats generate a huge volume of data, which can be difficult to decipher without further organization. A new method is presented here to organize and rank detected tandem repeats through clustering and classification. Our work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. Our new, alignment-free method facilitates the analysis of the myriad of tandem repeats replete in the human genome. We believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats. As with tandem repeats, promoter sequences of genes contain binding sites for proteins that play critical roles in mediating expression levels. Promoter region binding proteins and their co-factors influence timing and context of transcription. Despite the critical regulatory role of these non-coding sequences, computational methods to identify and predict DNA binding sites are extremely limited. The work reported here analyzes the relative occurrence of core promoter elements (CPEs) in and around transcription start sites. We found that out of all the data sets 49\%-63\% upstream regions have either TATA box or DPE elements. Our results suggest the possibility of predicting transcription start sites through combining CPEs signals with other promoter signals such as CpG islands and clusters of specific transcription binding sites

    Integrated multiple sequence alignment

    Get PDF
    Sammeth M. Integrated multiple sequence alignment. Bielefeld (Germany): Bielefeld University; 2005.The thesis presents enhancements for automated and manual multiple sequence alignment: existing alignment algorithms are made more easily accessible and new algorithms are designed for difficult cases. Firstly, we introduce the QAlign framework, a graphical user interface for multiple sequence alignment. It comprises several state-of-the-art algorithms and supports their parameters by convenient dialogs. An alignment viewer with guided editing functionality can also highlight or print regions of the alignment. Also phylogenetic features are provided, e.g., distance-based tree reconstruction methods, corrections for multiple substitutions and a tree viewer. The modular concept and the platform-independent implementation guarantee an easy extensibility. Further, we develop a constrained version of the divide-and-conquer alignment such that it can be restricted by anchors found earlier with local alignments. It can be shown that this method shares attributes of both, local and global aligners, in the quality of results as well as in the computation time. We further modify the local alignment step to work on bipartite (or even multipartite) sets for sequences where repeats overshadow valuable sequence information. In the end a technique is established that can accurately align sequences containing eventually repeated motifs. Finally, another algorithm is presented that allows to compare tandem repeat sequences by aligning them with respect to their possible repeat histories. We describe an evolutionary model including tandem duplications and excisions, and give an exact algorithm to compare two sequences under this model

    RIME: Repeat Identification

    Get PDF
    We present an algorithm for detecting long similar fragments occurring at least twice in a set of biological sequences. The problem becomes computationally challenging when the frequency of a repeat is allowed to increase and when a non-negligible number of insertions, deletions and substitutions are allowed. We introduce in this paper an algorithm, Rime1 1 Rime is also a reference to Coleridge's poem "The Rime of an Ancient Mariner" which contains many repetitions as a poetic device. (for Repeat Identification: long, Multiple, and with Edits) that performs this task, and manages instances whose size and combination of parameters cannot be handled by other currently existing methods. This is achieved by using a filter as a preprocessing step, and by then exploiting the information gathered by the filter in the following actual repeat inference step. To the best of our knowledge, Rime is the first algorithm that can accurately deal with very long repeats (up to a few thousands), occurring possibly several times, and with a rate of differences (substitutions and indels) allowed among copies of a same repeat of 10-15% or even more

    Genomic tools and sex determination in the extremophile brine shrimp Artemia franciscana

    Get PDF
    The aim of this study was the construction of a genomic Artemia toolkit. Sex-specific AFLP-based genetic maps were constructed based on 433 AFLP markers segregating in a 112 full-sib family, revealing 21 male and 22 female linkage groups (2n = 42). Fifteen putatively homologous linkage groups, including the sex linkage groups, were identified between the female and male linkage maps. Eight sex-linked markers, heterozygous in female animals, mapped to a single locus on a female linkage group, supporting the hypothesis of a WZ/ZZ genetic sex-determining system and showing primary sex determination is likely directed by a single gene. To fine-map the sex locus, bulked segregant analysis was performed. Candidate primary sex-determining genes were identified, including Cytochrome P450 which, through transcriptomic studies, is already known as a candidate sex-determining gene for Macrobrachium nipponense. The 1,310-Mbp Artemia draft genome sequence (N50 = 14,784 bp; GC-content = 35%; 176,667 scaffolds) was annotated, predicting 188,101 genes with an average length of 692 bp. Ninety-two percent of the transcriptome reads of Artemia in different conditions were present in the Artemia genome, indicating that the functional part of the genome under the RNAseq sampling conditions is virtually fully represented in the assembly. Several steps were taken in this study to introduce Artemia as a new genomic model for crustaceans. Although the functional part of the Artemia genome under the RNAseq sampling conditions is virtually fully represented in the assembly, thus making it useful for qualitative research, genome finishing strategies will still be necessary to complete the genome project. The further development of genomic resources for Artemia will add a completely new dimension to Artemia research and its use as live food in aquaculture

    Abstracts from the 3rd ECFG

    Get PDF
    Abstracts from the European Congress on Fungal Genetics #3, held March 27-30, 1996, Munster, German

    Reticulate Evolution: Symbiogenesis, Lateral Gene Transfer, Hybridization and Infectious heredity

    Get PDF
    info:eu-repo/semantics/publishedVersio

    Contraintes sélectives et adaptation chez l'homme : histoire évolutive des senseurs microbiens

    Get PDF
    Les Pattern-recognition receptors (PRRs) jouent un rôle clé dans la reconnaissance des microbes par l'hôte. La détection de sélection naturelle sur les gènes codant les PRRs permet de distinguer ceux qui sont essentiels de ceux qui montrent des fonctions plus redondantes. Nous nous intéressons ici à deux familles de PRRs cytosoliques humains: les NOD-like receptors (NLRs) qui reconnaissent a priori des bactéries et des signaux de danger cellulaire et les RIG-I-like receptors (RLRs) détectant les ARN provenant majoritairement de virus. Nous avons séquencé leurs 24 gènes dans un panel d'individus représentatif de la population mondiale. Nous avons d'une part mis en évidence que la majorité des NALPs, l'une des sous-familles de NLRs, était très contrainte, montrant un déficit en mutations non-synonymes. Cela suggère qu'ils ont joué un rôle essentiel et pourraient être impliqués dans des maladies graves ; ils devraient donc être étudiés en priorité dans une perspective médicale. Au contraire, la plupart des NOD/IPAF, autre sous-famille des NLRs, ainsi que les 3 RLRs semblent être impliqués dans des fonctions moins importantes ou plus redondantes, accumulant un grand nombre de changements dans la protéine. Ces données, ajoutées à celles des Toll-like-receptors (TLRs), nous ont permis de proposer un modèle hiérarchique, traduisant les contributions relatives des différentes familles de senseurs microbiens à notre survie. D'autre part, nous avons identifié certains gènes (NLRP1 en particulier) et variants comme étant sous sélection positive : ceux-ci pourraient expliquer les différences de résistance qui existent actuellement face à certaines maladies infectieuses.Pattern-recognition receptors (PRRs) constitute key actors in the recognition of microbes by the host. Detecting how natural selection has targeted their genes represents a useful tool to delineate those that are essential with respect to those that are more redundant in immune responses. Here, we studied the levels of naturally-occurring variation of two major families of human intracellular PRRs, the NOD-like Receptors (NLRs), mainly sensing bacteria and cellular danger signals and the RIG-I-like receptors (RLRs), essentially involved in the sensing of viral RNA. To this aim, we sequenced their 24 genes in a panel representative of worldwide diversity. First, we showed that most NALPs, a subfamily of NLRs, were strongly constrained, exhibiting a deficit in non-synonymous mutations. This suggests that they have played a major role in our survival and could be involved in severe diseases; their study should be therefore prioritized from a medical perspective. By contrast, most NOD/IPAF, another subfamily of NLRs, and the 3 RLRs seem to have less important or redundant functions, accumulating a high level of amino acid changes. This data, together with those of the Toll-like-receptors (TLRs), allowed to propose a hierarchical model, highlighting the relative contributions of the different families of microbial sensors to our survival. Furthermore, we identified some genes (NLRP1 in particular) and variants as under positive selection: they could explain some of our actual differences of resistance to infectious diseases

    Molecular analysis of post-transcriptional gene silencing: mechanisms and roles

    Get PDF
    This work is an investigation of post-transcriptional gene silencing (PTGS) in plants, a process that mediates sequence-specific degradation of RNA. Initially discovered in transgenic plants, PTGS has been long regarded as a curiosity, or even as an artefact of transgenesis. It is shown here that virus-induced gene silencing, in which recombinant viruses carrying element of the host genome trigger PTGS of the corresponding plant gene (Chapter one), is a manifestation of a defence system. This defence is remarkable in its ability to adapt to potentially any virus because its specificity is not genetically programmed by the host but, instead, is dictated by the genome sequence of the viral intruder itself. It is demonstrated in chapters 4 and 5 that PTGS of a transgene can spread in plants from one part to another, indicating the existence of a systemic, sequences-specific silencing signal that is likely to have a nucleic acid component. From the demonstration that replication of potato-virus X also triggers production of a silencing signal in non-transgenic plants (Chapter 8), it is proposed that this long-distance signalling process represents the systemic arm of the host PTGS defence response. Collectively, these findings define the existence of a previously uncharacterised antiviral mechanism in higher plants, which may also operate in animals. This defence holds key features of an elaborate immune system, as it is adaptive, mobile and specific. It is also shown, here, that plant viruses have elaborated counter-defensive measures to overcome the host PTGS response, by producing suppressor proteins that target various steps of the silencing mechanism (Chapters 6, 7). One of these factors, the PYX-encoded p25 protein, had been previously characterised as a facilitator of viral cell-to-cell movement. The finding that p25 specifically inhibits the signalling step of PTGS (Chapter 8) provides a new ground for the investigation of virus movement in plants. In chapter 9, the role of PTGS in plants and its suppression by viruses is discussed in the broader context of plant development and biotechnological applications
    corecore