10,622 research outputs found

    An investigation into inter- and intragenomic variations of graphic genomic signatures

    Get PDF
    We provide, on an extensive dataset and using several different distances, confirmation of the hypothesis that CGR patterns are preserved along a genomic DNA sequence, and are different for DNA sequences originating from genomes of different species. This finding lends support to the theory that CGRs of genomic sequences can act as graphic genomic signatures. In particular, we compare the CGR patterns of over five hundred different 150,000 bp genomic sequences originating from the genomes of six organisms, each belonging to one of the kingdoms of life: H. sapiens, S. cerevisiae, A. thaliana, P. falciparum, E. coli, and P. furiosus. We also provide preliminary evidence of this method's applicability to closely related species by comparing H. sapiens (chromosome 21) sequences and over one hundred and fifty genomic sequences, also 150,000 bp long, from P. troglodytes (Animalia; chromosome Y), for a total length of more than 101 million basepairs analyzed. We compute pairwise distances between CGRs of these genomic sequences using six different distances, and construct Molecular Distance Maps that visualize all sequences as points in a two-dimensional or three-dimensional space, to simultaneously display their interrelationships. Our analysis confirms that CGR patterns of DNA sequences from the same genome are in general quantitatively similar, while being different for DNA sequences from genomes of different species. Our analysis of the performance of the assessed distances uses three different quality measures and suggests that several distances outperform the Euclidean distance, which has so far been almost exclusively used for such studies. In particular we show that, for this dataset, DSSIM (Structural Dissimilarity Index) and the descriptor distance (introduced here) are best able to classify genomic sequences.Comment: 14 pages, 6 figures, 5 table

    Genesis of ancestral haplotypes: RNA modifications and reverse transcription–mediated polymorphisms

    Get PDF
    Understanding the genesis of the block haplotype structure of the genome is a major challenge. With the completion of the sequencing of the Human Genome and the initiation of the HapMap project the concept that the chromosomes of the mammalian genome are a mosaic, or patchwork, of conserved extended block haplotype sequences is now accepted by the mainstream genomics research community. Ancestral Haplotypes (AHs) can be viewed as a recombined string of smaller Polymorphic Frozen Blocks (PFBs). How have such variant extended DNA sequence tracts emerged in evolution? Here the relevant literature on the problem is reviewed from various fields of molecular and cell biology particularly molecular immunology and comparative and functional genomics. Based on our synthesis we then advance a testable molecular and cellular model. A critical part of the analysis concerns the origin of the strand biased mutation signatures in the transcribed regions of the human and higher primate genome, A-to-G versus T-to-C (ratio ~1.5 fold) and C-to-T versus G-to-A (≄1.5 fold). A comparison and evaluation of the current state of the fields of immunoglobulin Somatic Hypermutation (SHM) and Transcription-Coupled DNA Repair focused on how mutations in newly synthesized RNA might be copied back to DNA thus accounting for some of the genome-wide strand biases (e.g., the A-to-G vs T-to-C component of the strand biased spectrum). We hypothesize that the genesis of PFBs and extended AHs occurs during mutagenic episodes in evolution (e.g., retroviral infections) and that many of the critical DNA sequence diversifying events occur first at the RNA level, e.g., recombination between RNA strings resulting in tandem and dispersed RNA duplications (retroduplications), RNA mutations via adenosine-to-inosine pre-mRNA editing events as well as error prone RNA synthesis. These are then copied back into DNA by a cellular reverse transcription process (also likely to be error-prone) that we have called "reverse transcription-mediated long DNA conversion." Finally we suggest that all these activities and others can be envisaged as being brought physically under the umbrella of special sites in the nucleus involved in transcription known as "transcription factories."

    Resolving Prokaryotic Taxonomy without rRNA: Longer Oligonucleotide Word Lengths Improve Genome and Metagenome Taxonomic Classification

    Get PDF
    abstract: Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.The article is published at http://journals.plos.org/plosone/article?id=10.1371/journal.pone.006733

    Manual on application of molecular tools in aquaculture and inland fisheries management. Part 2. Laboratory protocols and data analysis

    Get PDF
    The aim of this manual is to provide a comprehensive practical tool for the generation and analysis of genetic data for subsequent application in aquatic resources management in relation to genetic stock identification in inland fisheries and aquaculture. The material only covers general background on genetics in relation to aquaculture and fisheries resource management, the techniques and relevant methods of data analysis that are commonly used to address questions relating to genetic resource characterisation and population genetic analyses. No attempt is made to include applications of genetic improvement techniques e.g. selective breeding or producing genetically modified organisms (GMOs). The manual includes two ‘stand-alone’ parts, of which this is the second volume: Part 1 – Conceptual basis of population genetic approaches: will provide a basic foundation on genetics in general, and concepts of population genetics. Issues on the choices of molecular markers and project design are also discussed. Part 2 – Laboratory protocols, data management and analysis: will provide step-by-step protocols of the most commonly used molecular genetic techniques utilised in population genetics and systematic studies. In addition, a brief discussion and explanation of how these data are managed and analysed is also included. This manual is expected to enable NACA member country personnel to be trained to undertake molecular genetic studies in their own institutions, and as such is aimed at middle and higher level technical grades. The manual can also provide useful teaching material for specialised advanced level university courses in the region and postgraduate students. The manual has gone through two development/improvement stages. The initial material was tested at a regional workshop and at the second stage feedback from participants was used to improve the contents

    Mapping the Space of Genomic Signatures

    Full text link
    We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to kk (herein k=9k=9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence homology and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information.Comment: 14 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1307.375

    Molecular Distance Maps: An alignment-free computational tool for analyzing and visualizing DNA sequences\u27 interrelationships

    Get PDF
    In an attempt to identify and classify species based on genetic evidence, we propose a novel combination of methods to quantify and visualize the interrelationships between thousand of species. This is possible by using Chaos Game Representation (CGR) of DNA sequences to compute genomic signatures which we then compare by computing pairwise distances. In the last step, the original DNA sequences are embedded in a high dimensional space using Multi-Dimensional Scaling (MDS) before everything is projected on a Euclidean 3D space. To start with, we apply this method to a mitochondrial DNA dataset from NCBI containing over 3,000 species. The analysis shows that the oligomer composition of full mtDNA sequences can be a source of taxonomic information, suggesting that this method could be used for unclassified species and taxonomic controversies. Next, we test the hypothesis that CGR-based genomic signature is preserved along a species\u27 genome by comparing inter- and intra-genomic signatures of nuclear DNA sequences from six different organisms, one from each kingdom of life. We also compare six different distances and we assess their performance using statistical measures. Our results support the existence of a genomic signature for a species\u27 genome at the kingdom level. In addition, we test whether CGR-based genomic signatures originating only from nuclear DNA can be used to distinguish between closely-related species and we answer in the negative. To overcome this limitation, we propose the concept of ``composite signatures\u27\u27 which combine information from different types of DNA and we show that they can effectively distinguish all closely-related species under consideration. We also propose the concept of ``assembled signatures\u27\u27 which, among other advantages, do not require a long contiguous DNA sequence but can be built from smaller ones consisting of ~100-300 base pairs. Finally, we design an interactive webtool MoDMaps3D for building three-dimensional Molecular Distance Maps. The user can explore an already existing map or build his/her own using NCBI\u27s accession numbers as input. MoDMaps3D is platform independent, written in Javascript and can run in all major modern browsers

    Translesion DNA synthesis-assisted non-homologous end-joining of complex double-strand breaks prevents loss of DNA sequences in mammalian cells

    Get PDF
    Double strand breaks (DSB) are severe DNA lesions, and if not properly repaired, may lead to cell death or cancer. While there is considerable data on the repair of simple DSB (sDSB) by non-homologous end-joining (NHEJ), little is known about the repair of complex DSBs (cDSB), namely breaks with a nearby modification, which precludes ligation without prior processing. To study the mechanism of cDSB repair we developed a plasmid-based shuttle assay for the repair of a defined site-specific cDSB in cultured mammalian cells. Using this assay we found that repair efficiency and accuracy of a cDSB with an abasic site in a 5â€Č overhang was reduced compared with a sDSB. Translesion DNA synthesis (TLS) across the abasic site located at the break prevented loss of DNA sequences, but was highly mutagenic also at the template base next to the abasic site. Similar to sDSB repair, cDSB repair was totally dependent on XrccIV, and altered in the absence of Ku80. In contrast, Artemis appears to be specifically involved in cDSB repair. These results may indicate that mammalian cells have a damage control strategy, whereby severe deletions are prevented at the expense of the less deleterious point mutations during NHEJ

    Genomic approaches and their contributions to understanding the European Neolithisation

    Get PDF
    The contribution of ancient DNA to the understanding of past events has been increasing exponentially in recent years. This is mainly due to the synergy of technical advances, such as the molecular technique of high-throughput DNA sequencing, which has allowed for the reconstruction of complete genomes as old as 750 000 years. Another step toward the cost-effective characterisation of ancient genomes is the sampling of petrous bone, which has allowed sequencing of the first ancient African genome. Here I review the significant contribution of ancient genomics to our understanding of the European Neolithisation process.V zadnji letih se je eksponentno povečal prispevek stare DNK k razumevanju preteklosti, kar je predvsem posledica sinergije tehničnih napredkov, npr. molekularne diagnostike visoke zahtevnosti DNK sekvence, s katero lahko rekonstruiramo celoten genom tudi do 750 000 let starih vzorcev. Drug korak k stroơkovno učinkovitejơi karakterizaciji starega genoma je vzorčenje kosti skalnice v lobanji, s katero smo lahko dobili sekvenco prvega starega afriơkega genoma. V članku nudim pregled nad glavnimi prispevki raziskave starega genoma k razumevanju evropskega procesa neolitizacije

    Additive methods for genomic signatures

    Get PDF
    • 

    corecore