41 research outputs found

    Safe and complete contig assembly via omnitigs

    Full text link
    Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph GG (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from GG as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

    Meraculous: De Novo Genome Assembly with Short Paired-End Reads

    Get PDF
    We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ∼280 bp or ∼3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed

    Genetic diversity analysis of common beans based on molecular markers

    Get PDF
    A core collection of the common bean (Phaseolus vulgaris L.), representing genetic diversity in the entire Mexican holding, is kept at the INIFAP (Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias, Mexico) Germplasm Bank. After evaluation, the genetic structure of this collection (200 accessions) was compared with that of landraces from the states of Oaxaca, Chiapas and Veracruz (10 genotypes from each), as well as a further 10 cultivars, by means of four amplified fragment length polymorphisms (AFLP) +3/+3 primer combinations and seven simple sequence repeats (SSR) loci, in order to define genetic diversity, variability and mutual relationships. Data underwent cluster (UPGMA) and molecular variance (AMOVA) analyses. AFLP analysis produced 530 bands (88.5% polymorphic) while SSR primers amplified 174 alleles, all polymorphic (8.2 alleles per locus). AFLP indicated that the highest genetic diversity was to be found in ten commercial-seed classes from two major groups of accessions from Central Mexico and Chiapas, which seems to be an important center of diversity in the south. A third group included genotypes from Nueva Granada, Mesoamerica, Jalisco and Durango races. Here, SSR analysis indicated a reduced number of shared haplotypes among accessions, whereas the highest genetic components of AMOVA variation were found within accessions. Genetic diversity observed in the common-bean core collection represents an important sample of the total Phaseolus genetic variability at the main Germplasm Bank of INIFAP. Molecular marker strategies could contribute to a better understanding of the genetic structure of the core collection as well as to its improvement and validation

    Major prospects for exploring canine vector borne diseases and novel intervention methods using 'omic technologies

    Get PDF
    Canine vector-borne diseases (CVBDs) are of major socioeconomic importance worldwide. Although many studies have provided insights into CVBDs, there has been limited exploration of fundamental molecular aspects of most pathogens, their vectors, pathogen-host relationships and disease and drug resistance using advanced, 'omic technologies. The aim of the present article is to take a prospective view of the impact that next-generation, 'omics technologies could have, with an emphasis on describing the principles of transcriptomic/genomic sequencing as well as bioinformatic technologies and their implications in both fundamental and applied areas of CVBD research. Tackling key biological questions employing these technologies will provide a 'systems biology' context and could lead to radically new intervention and management strategies against CVBDs

    Parameterized Pattern Matching

    No full text

    Chloroplast DNA Microsatellites Reveal Contrasting Phylogeographic Structure in Mahogany (Swietenia macrophylla King, Meliaceae) from Amazonia and Central America

    Get PDF
    Big-leaf mahogany (Swietenia macrophylla) is one of the most valuable and overharvested timber trees of tropical America. A description of the organization of genetic variation across its broad range would be useful for management of genetic diversity and for understanding its demographic history. Here we report on a phylogeographic analysis of mahogany based on six polymorphic cpDNA simple sequence repeat loci (cpSSRs) genotyped in 16 populations distributed across the Brazilian Amazon and Mesoamerica (N = 245 individuals). Of the 31 cpDNA haplotypes identified, 15 occurred in Amazonia and 16 in Mesoamerica with no single haplotype shared between the two regions. The populations from Central America showed moderate differentiation (FST = 0.36) while within population genetic diversity was generally high (mean Nei's HE = 0.639). In contrast, the Amazonian populations were strongly differentiated (FST = 0.95) and contained low haplotype diversity (mean HE = 0.176), with the exception of the highly diverse Marajoara population from the Eastern Amazon (HE = 0.925). SAMOVA identified a single Mesoamerican phylogroup and four Amazonian phylogroups, indicating stronger phylogeographic structure within Amazonia. The results demonstrate high levels of cpDNA variation and differentiation of regional S. macrophylla populations, and provide the first evidence of a major phylogeographic break between Mesoamerican and South American mahogany populations
    corecore