22 research outputs found

    RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences

    Get PDF
    Background: One of the most frequent uses of bioinformatics tools concerns functional characterization of a newly produced nucleotide sequence (a query sequence) by applying Blast or FASTA against a set of sequences (the subject sequences). However, in some specific contexts, it is useful to compare the query sequence against a cluster such as a MultiAlignment (MA). We present here the RegExpBlasting (REB) algorithm, which compares an unclassified sequence with a dataset of patterns defined by application of Regular Expression rules to a given-as-input MA datasets. The REB algorithm workflow consists in i. the definition of a dataset of multialignments ii. the association of each MA to a pattern, defined by application of regular expression rules; iii. automatic characterization of a submitted biosequence according to the function of the sequences described by the pattern best matching the query sequence. Results: An application of this algorithm is used in the "characterize your sequence" tool available in the PPNEMA resource. PPNEMA is a resource of Ribosomal Cistron sequences from various species, grouped according to nematode genera. It allows the retrieval of plant nematode multialigned sequences or the classification of new nematode rDNA sequences by applying REB. The same algorithm also supports automatic updating of the PPNEMA database. The present paper gives examples of the use of REB within PPNEMA. Conclusion: The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method. Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required. The statistical tests carried out here show the powerful flexibility of the method

    The reference human nuclear mitochondrial sequences compilation validated and implemented on the UCSC genome browser

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Eukaryotic nuclear genomes contain fragments of mitochondrial DNA called NumtS (Nuclear mitochondrial Sequences), whose mode and time of insertion, as well as their functional/structural role within the genome are debated issues. Insertion sites match with chromosomal breaks, revealing that micro-deletions usually occurring at non-homologous end joining <it>loci </it>become reduced in presence of NumtS. Some NumtS are involved in recombination events leading to fragment duplication. Moreover, NumtS are polymorphic, a feature that renders them candidates as population markers. Finally, they are a cause of contamination during human mtDNA sequencing, leading to the generation of false heteroplasmies.</p> <p>Results</p> <p>Here we present RHNumtS.2, the most exhaustive human NumtSome catalogue annotating 585 NumtS, 97% of which were here validated in a European individual and in HapMap samples. The NumtS complete dataset and related features have been made available at the UCSC Genome Browser. The produced sequences have been submitted to INSDC databases. The implementation of the RHNumtS.2 tracks within the UCSC Genome Browser has been carried out with the aim to facilitate browsing of the NumtS tracks to be exploited in a wide range of research applications.</p> <p>Conclusions</p> <p>We aimed at providing the scientific community with the most exhaustive overview on the human NumtSome, a resource whose aim is to support several research applications, such as studies concerning human structural variation, diversity, and disease, as well as the detection of false heteroplasmic mtDNA variants. Upon implementation of the NumtS tracks, the application of the BLAT program on the UCSC Genome Browser has now become an additional tool to check for heteroplasmic artefacts, supported by data available through the NumtS tracks.</p

    The RD-Connect Genome-Phenome Analysis Platform: Accelerating diagnosis, research, and gene discovery for rare diseases.

    Get PDF
    Rare disease patients are more likely to receive a rapid molecular diagnosis nowadays thanks to the wide adoption of next-generation sequencing. However, many cases remain undiagnosed even after exome or genome analysis, because the methods used missed the molecular cause in a known gene, or a novel causative gene could not be identified and/or confirmed. To address these challenges, the RD-Connect Genome-Phenome Analysis Platform (GPAP) facilitates the collation, discovery, sharing, and analysis of standardized genome-phenome data within a collaborative environment. Authorized clinicians and researchers submit pseudonymised phenotypic profiles encoded using the Human Phenotype Ontology, and raw genomic data which is processed through a standardized pipeline. After an optional embargo period, the data are shared with other platform users, with the objective that similar cases in the system and queries from peers may help diagnose the case. Additionally, the platform enables bidirectional discovery of similar cases in other databases from the Matchmaker Exchange network. To facilitate genome-phenome analysis and interpretation by clinical researchers, the RD-Connect GPAP provides a powerful user-friendly interface and leverages tens of information sources. As a result, the resource has already helped diagnose hundreds of rare disease patients and discover new disease causing genes

    Primates and mouse NumtS in the UCSC Genome Browser

    No full text
    Abstract BACKGROUND: NumtS (Nuclear MiTochondrial Sequences) are mitochondrial DNA sequences that, after stress events involving the mitochondrion, colonized the nuclear genome. Accurate mapping of NumtS avoids contamination during mtDNA PCR amplification, thus supplying reliable bases for detecting false heteroplasmies. In addition, since they commonly populate mammalian genomes (especially primates) and are polymorphic, in terms of presence/absence and content of SNPs, they may be used as evolutionary markers in intra- and inter-species population analyses. RESULTS: The need for an exhaustive NumtS annotation led us to produce the Reference Human NumtS compilation, followed, as reported in this paper, by those for chimpanzee, rhesus macaque and mouse ones. Identification of NumtS inside the UCSC Genome Browser and their inter-species comparison required the design and the implementation of NumtS tracks, starting from the compilation data. NumtS retrieval through the UCSC Genome Browser, in the species examined, is now feasible at a glance. CONCLUSIONS: Analyses involving NumtS tracks, together with other genome element tracks publicly available at the UCSC Genome Browser, can provide deep insight into genome evolution and comparative genomics, thus improving studies dealing with the mechanisms that drove the generation of NumtS. In addition, the NumtS tracks constitute a useful tool in the design of mitochondrial DNA primers

    MitoDrome: a database of Drosophila melanogaster nuclear genes encoding proteins targeted to the mitochondrion

    No full text
    Mitochondria are organelle present in the cytoplasm of most eukaryotic cells; although they have their own DNA, the majority of the proteins necessary for a functional mitochondrion are coded by the nuclear DNA and only after transcription and translation they are imported in the mitochondrion as proteins. The primary role of the mitochondrion is electron transport and oxidative phosphorylation. Although it has been studied for a long time, the interest of researchers in mitochondria is still alive thank to the discovery of mitochondrial role in apoptosis, aging and cancer. Aim of the MitoDrome database is to annotate the Drosophila melanogaster nuclear genes coding for mitochondrial proteins in order to contribute to the functional characterization of nuclear genes coding for mitochondrial proteins and to knowledge of gene disease related to mitochondrial dysfunctions. Indeed D. melanogaster is one of the most studied organism and a model for the Human genome. Data are derived from the comparison of Human mitochondrial proteins versus the Drosophila genome, ESTs and cDNA sequence data available in the FlyBase database. Links from the MitoDrome entries to the related homologous entries available in MitoNuC will be soon implemented. The MitoDrome database is available at http://bighost.area.ba.cnr.it/BIG/MitoDrome. Data are organised in a flat-file format and can be retrieved using the SRS system

    Mitochondrial DNA variability of West New Guinea populations

    No full text
    This paper reports human mitochondrial DNA variability in West New Guinea (the least known, western side of the island of New Guinea), not yet described from a molecular perspective. The study was carried out on 202 subjects from 12 ethnic groups, belonging to six different Papuan language families, representative of both mountain and coastal plain areas. Mitochondrial DNA hypervariable region 1 (HVS 1) and the presence of the 9-bp deletion (intergenic region COII-tRNA(Lys)) were investigated. HVS 1 sequencing identified 73 polymorphic sites defining 89 haplotypes; the 9-bp deletion, which is considered a marker of Austronesian migration in the Pacific, was found to be absent in the whole West New Guinea study sample. Statistical analysis applied to the resulting haplotypes reveal high heterogeneity and an intersecting distribution of genetic variability in these populations, despite their cultural and geographic diversity. The results of subsequent phylogenetic approaches subdivide mtDNA diversity in West New Guinea into three main clusters (groups I-III), defined by sets of polymorphisms which are also shared by some individuals from Papua New Guinea. Comparisons With worldwide HVS 1 sequences stored in the MitBASE database show the absence of these patterns outside Oceania and a few Indonesian subjects, who also lack the 9-bp deletion. This finding, which is consistent with the effects of genetic drift and prolonged isolation of West New Guinea populations, lead us to regard these patterns as New Guinea population markers, which may harbor the genetic memory of the earliest human migrations to the island. (C) 2002 Wiley-Liss, Inc

    Genetic analysis of mitochondrial DNA control region variations in four tribes of Khyber Pakhtunkhwa, Pakistan

    No full text
    Due to its geo strategic position at the crossroad of Asia, Pakistan has gained crucial importance of playing its pivotal role in subsequent human migratory events, both prehistoric and historic. This human movement became possible through an ancient overland network of trails called "The Silk Route" linking Asia Minor, Middle East China, Central Asia and Southeast Asia. This study was conducted to analyze complete mitochondrial control region samples of 100 individuals of four major Pashtun tribes namely, Bangash, Khattak, Mahsuds and Orakzai in the province of Khyber Pakhtunkhwa, Pakistan. All Pashtun tribes revealed high genetic diversity which is comparable to the other Central Asian, Southeast Asian and European populations. The configuration of genetic variation and heterogeneity further unveiled through Multidimensional Scaling, Principal Component Analysis and phylogenetic analysis. The results revealed that Pashtun are the composite mosaic of West Eurasian ancestry of numerous geographic origin. They received substantial gene flow during different invasive movements and have a high element of the Western provenance. The most common haplogroups reported in this study are: South Asian haplogroups M (28%) and R (8%); whereas, West Asians haplogroups are present, albeit in high frequencies (67%) and widespread over all; HV (15%), U (17%), H (9%), J (8%), K (8%), W (4%), N (3%) and T (3%). Moreover, we linked the unexplored genetic connection between Ashkenazi Jews and Pashtun. The presence of specific haplotypes J1b (4%) and K1a1b1a (5%) pointed to a genetic connection of Jewish conglomeration in Khattak tribe. This was a result of an ancient genetic influx in the early Neolithic period that led to the formation of a diverse genetic substratum in present day Pashtun

    Searching for a needle in the haystack: Comparing six methods to evaluate heteroplasmy in difficult sequence context

    No full text
    Abstract Mitochondrial DNA (mtDNA) mutations have been involved in disease, aging and cancer and furthermore exploited for evolutionary and forensic investigation. When investigating mtDNA mutations the peculiar aspects of mitochondrial genetics, such as heteroplasmy and threshold effect, require suitable approaches which must be sensitive enough to detect low-level heteroplasmy and, precise enough to quantify the exact mutational load. In order to establish the optimal approach for the evaluation of heteroplasmy, six methods were experimentally compared for their capacity to reveal and quantify mtDNA variants. Drawbacks and advantages of cloning, Fluorescent PCR (F-PCR), denaturing High Performance Liquid Chromatography (dHPLC), quantitative Real-Time PCR (qRTPCR), High Resolution Melting (HRM) and 454 pyrosequencing were determined. In particular, detection and quantification of a mutation in a difficult sequence context were investigated, through analysis of an insertion in a homopolymeric stretch (m.3571insC)

    Clustering mtDNA sequences for human evolution studies

    No full text
    A novel distance method for sequence classification and intraspecie phylogeny reconstruction is proposed. The method incorporates biologically motivated definitions of DNA sequence distance in the recently proposed Chaotic Map Clustering algorithm (CMC) which performs a hierarchical partition of data by exploiting the cooperative behavior of an inhomogeneous lattice of chaotic maps living in the space of data. Simulation results show that our method outperforms, on average, the simple and most widely used approach to intra specie phylogeny reconstruction based on the Neighbor Joining (NJ) algorithm. The method has been tested on real data too, by applying it to two distinct datasets of human mtDNA HVRI haplotypes from different geographical origins. A comparison with results from other well known methods such as Stochastic Stationary Markov method and Reduced Median Network has also been performed.A novel distance method for sequence classification and intraspeie phylogeny reconstruction is proposed. The method incorporates biologically motivated definitions of DNA sequence distance in the recently proposed Chaotic Map Clustering algorithm (CMC) which performs a hierarchical partition of data by exploiting the cooperative behavior of an inhomogeneius lattice of chaotic maps living in the space of data. Simulation results show that our method outperforms, on average, the simply and most widely used approach to intra specie phylogeny reconstruction base on the Neighbor Joining (NJ) algorithm. The method has been tested on real data too, by applying it to two distinct data sets of human mtDNA HVRI haplotypes from different geographical origins. A comparison with results from other well known methods such as Stochastic Stationary Markov method and Reduced Median Network has also been performed
    corecore