Search CORE

6,475 research outputs found

Barcodes for genomes and applications

Author: Olman Victor
Xu Ying
Zhou Fengfeng
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Each genome has a stable distribution of the combined frequency for each <it>k</it>-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1<k<6. The collection of these <it>k</it>-mer frequency distributions is unique to each genome and termed the genome's <it>barcode</it>. Results We found that for each genome, the majority of its short sequence fragments have highly similar barcodes while sequence fragments with different barcodes typically correspond to genes that are horizontally transferred or highly expressed. This observation has led to new and more effective ways for addressing two challenging problems: metagenome binning problem and identification of horizontally transferred genes. Our barcode-based metagenome binning algorithm substantially improves the state of the art in terms of both binning accuracies and the scope of applicability. Other attractive properties of genomes barcodes include (a) the barcodes have different and identifiable characteristics for different classes of genomes like prokaryotes, eukaryotes, mitochondria and plastids, and (b) barcodes similarities are generally proportional to the genomes' phylogenetic closeness. Conclusion These and other properties of genomes barcodes make them a new and effective tool for studying numerous genome and metagenome analysis problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Inference of Ancestral Recombination Graphs through Topological Data Analysis

Author: Camara Pablo G.
Levine Arnold J.
Rabadan Raul
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and example files used in the manuscript can be obtained from https://github.com/RabadanLab/TARGe

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

FigShare

From Pine Cones to Read Clouds: Rescaffolding the Megagenome of Sugar Pine (Pinus lambertiana).

Author: Crepeau Marc W
Langley Charles H
Stevens Kristian A
Publication venue: eScholarship, University of California
Publication date: 01/05/2017
Field of study

We investigate the utility and scalability of new read cloud technologies to improve the draft genome assemblies of the colossal, and largely repetitive, genomes of conifers. Synthetic long read technologies have existed in various forms as a means of reducing complexity and resolving repeats since the outset of genome assembly. Recently, technologies that combine subhaploid pools of high molecular weight DNA with barcoding on a massive scale have brought new efficiencies to sample preparation and data generation. When combined with inexpensive light shotgun sequencing, the resulting data can be used to scaffold large genomes. The protocol is efficient enough to consider routinely for even the largest genomes. Conifers represent the largest reference genome projects executed to date. The largest of these is that of the conifer Pinus lambertiana (sugar pine), with a genome size of 31 billion bp. In this paper, we report on the molecular and computational protocols for scaffolding the P. lambertiana genome using the library technology from 10× Genomics. At 247,000 bp, the NG50 of the existing reference sequence is the highest scaffold contiguity among the currently published conifer assemblies; this new assembly's NG50 is 1.94 million bp, an eightfold increase

Crossref

Directory of Open Access Journals

eScholarship - University of California

How and why DNA barcodes underestimate the diversity of microbial eukaryotes

Author: Adam Eyre-Walker
AR Boyko
AZ Worden
AZ Worden
B Charlesworth
B Palenik
DT Jones
F Not
G Piganeau
Gwenael Piganeau
Hervé Moreau
J Coyne
J Crow
JJ Welch
K Romari
M Viprey
ML Cuvelier
Nigel Grimsley
P Flicek
P Lopez-Garcia
PD Keightley
Purification Lopez-Garcia
S Gourbiere
S Jancek
S Proost
SB Needleman
SJ Williamson
SL Baldauf
SY Moon-van der Staay
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2011
Field of study

Background: Because many picoplanktonic eukaryotic species cannot currently be maintained in culture, direct sequencing of PCR-amplified 18S ribosomal gene DNA fragments from filtered sea-water has been successfully used to investigate the astounding diversity of these organisms. The recognition of many novel planktonic organisms is thus based solely on their 18S rDNA sequence. However, a species delimited by its 18S rDNA sequence might contain many cryptic species, which are highly differentiated in their protein coding sequences. Principal Findings: Here, we investigate the issue of species identification from one gene to the whole genome sequence. Using 52 whole genome DNA sequences, we estimated the global genetic divergence in protein coding genes between organisms from different lineages and compared this to their ribosomal gene sequence divergences. We show that this relationship between proteome divergence and 18S divergence is lineage dependant. Unicellular lineages have especially low 18S divergences relative to their protein sequence divergences, suggesting that 18S ribosomal genes are too conservative to assess planktonic eukaryotic diversity. We provide an explanation for this lineage dependency, which suggests that most species with large effective population sizes will show far less divergence in 18S than protein coding sequences. Conclusions: There is therefore a trade-off between using genes that are easy to amplify in all species, but which by their nature are highly conserved and underestimate the true number of species, and using genes that give a better description of the number of species, but which are more difficult to amplify. We have shown that this trade-off differs between unicellular and multicellular organisms as a likely consequence of differences in effective population sizes. We anticipate that biodiversity of microbial eukaryotic species is underestimated and that numerous ''cryptic species'' will become discernable with the future acquisition of genomic and metagenomic sequences

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Sussex Research Online

Mapping the Space of Genomic Signatures

Author: Bryans Nathaniel
Dattani Nikesh S.
Davis Katelyn
Hill Kathleen A.
Karamichalis Rallis
Kari Lila
Sayem Abu S.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 09/10/2014
Field of study

We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to

k

(herein

k=9

) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence homology and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information.Comment: 14 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1307.375

arXiv.org e-Print Archive

Directory of Open Access Journals

Mitochondrial metagenomics: letting the genes out of the bottle

Author: Crampton-Platt Alex
Vogler Alfried P.
Yu Douglas W.
Zhou Xin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

‘Mitochondrial metagenomics’ (MMG) is a methodology for shotgun sequencing of total DNA from specimen mixtures and subsequent bioinformatic extraction of mitochondrial sequences. The approach can be applied to phylogenetic analysis of taxonomically selected taxa, as an economical alternative to mitogenome sequencing from individual species, or to environmental samples of mixed specimens, such as from mass trapping of invertebrates. The routine generation of mitochondrial genome sequences has great potential both for systematics and community phylogenetics. Mapping of reads from low-coverage shotgun sequencing of environmental samples also makes it possible to obtain data on spatial and temporal turnover in whole-community phylogenetic and species composition, even in complex ecosystems where species-level taxonomy and biodiversity patterns are poorly known. In addition, read mapping can produce information on species biomass, and potentially allows quantification of within-species genetic variation. The success of MMG relies on the formation of numerous mitochondrial genome contigs, achievable with standard genome assemblers, but various challenges for the efficiency of assembly remain, particularly in the face of variable relative species abundance and intra-specific genetic variation. Nevertheless, several studies have demonstrated the power of mitogenomes from MMG for accurate phylogenetic placement, evolutionary analysis of species traits, biodiversity discovery and the establishment of species distribution patterns; it offers a promising avenue for unifying the ecological and evolutionary understanding of species diversity

Crossref

Springer - Publisher Connector

PubMed Central

Spiral - Imperial College Digital Repository

University of East Anglia digital repository

Recommended from our members

An Ultrahigh-throughput Microfluidic Platform for Single-cell Genome Sequencing.

Author: Abate Adam R
Demaree Benjamin
Lan Freeman
Weisgerber Daniel
Publication venue: eScholarship, University of California
Publication date: 01/05/2018
Field of study

Sequencing technologies have undergone a paradigm shift from bulk to single-cell resolution in response to an evolving understanding of the role of cellular heterogeneity in biological systems. However, single-cell sequencing of large populations has been hampered by limitations in processing genomes for sequencing. In this paper, we describe a method for single-cell genome sequencing (SiC-seq) which uses droplet microfluidics to isolate, amplify, and barcode the genomes of single cells. Cell encapsulation in microgels allows the compartmentalized purification and tagmentation of DNA, while a microfluidic merger efficiently pairs each genome with a unique single-cell oligonucleotide barcode, allowing >50,000 single cells to be sequenced per run. The sequencing data is demultiplexed by barcode, generating groups of reads originating from single cells. As a high-throughput and low-bias method of single-cell sequencing, SiC-seq will enable a broader range of genomic studies targeted at diverse cell populations

eScholarship - University of California

DNA barcoding as a molecular tool to track down mislabeling and food piracy

Author: Barcaccia Gianni
Cassandro Martino
Lucchin Margherita
Publication venue: 'MDPI AG'
Publication date: 01/12/2015
Field of study

DNA barcoding is a molecular technology that allows the identification of any biological species by amplifying, sequencing and querying the information from genic and/or intergenic standardized target regions belonging to the extranuclear genomes. Although these sequences represent a small fraction of the total DNA of a cell, both chloroplast and mitochondrial barcodes chosen for identifying plant and animal species, respectively, have shown sufficient nucleotide diversity to assess the taxonomic identity of the vast majority of organisms used in agriculture. Consequently, cpDNA and mtDNA barcoding protocols are being used more and more in the food industry and food supply chains for food labeling, not only to support food safety but also to uncover food piracy in freshly commercialized and technologically processed products. Since the extranuclear genomes are present in many copies within each cell, this technology is being more easily exploited to recover information even in degraded samples or transformed materials deriving from crop varieties and livestock species. The strong standardization that characterizes protocols used worldwide for DNA barcoding makes this technology particularly suitable for routine analyses required by agencies to safeguard food safety and quality. Here we conduct a critical review of the potentials of DNA barcoding for food labeling along with the main findings in the area of food piracy, with particular reference to agrifood and livestock foodstuffs

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Padova

Next generation sequencing and comparative analyses of Xenopusmitogenomes

Author: Foster P.
Guille Matt
Littlewood D.
Lloyd Rhiannon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

Portsmouth University Research Portal (Pure)

A haplotype-resolved draft genome of the European sardine (Sardina pilchardus)

Author: Canario A.V.M.
Cox Cymon
De Moro Gianluca
Garcia Carlos
Louro Bruno
Sabatino Stephen J.
Santos António M.
Veríssimo Ana
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

The European sardine (Sardina pilchardus Walbaum, 1792) is culturally and economically important throughout its distribution. Monitoring studies of sardine populations report an alarming decrease in stocks due to overfishing and environmental change, which has resulted in historically low captures along the Iberian Atlantic coast. Important biological and ecological features such as population diversity, structure, and migratory patterns can be addressed with the development and use of genomics resources.Agência financiadora Portuguese national funds from FCT-Foundation for Science and Technology: UID/Multi/04326/2016; European Regional Development Fund (FEDER): 22153-01/SAICT/2016; ALG-01-0145-FEDER-022121; ALG-01-0145-FEDER-022231; MAR2020 operational programme of the European Maritime and Fisheries Fund (project SARDI-NOMICS): MAR-01.04.02-FEAMP-0024; European Union's Horizon 2020 research and innovation programme: 654008info:eu-repo/semantics/publishedVersio

Sapientia