Search CORE

791 research outputs found

SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications

Author: Garrison Erik
Lee Wan-Ping
Marth Gabor T.
Zhao Mengyao
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Summary: The Smith Waterman (SW) algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools, but current implementations are either designed as monolithic protein database searching tools or are embedded into other tools. To facilitate easy integration of the fast Single Instruction Multiple Data (SIMD) SW algorithm into third party software, we wrote a C/C++ library, which extends Farrars Striped SW (SSW) to return alignment information in addition to the optimal SW score. Availability: SSW is available both as a C/C++ software library, as well as a stand alone alignment tool wrapping the librarys functionality at https://github.com/mengyao/Complete- Striped-Smith-Waterman-Library Contact: [email protected]: 3 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Analysis of concordance of different haplotype block partitioning algorithms

Author: Indap Amit R
Marth Gabor T
Olivier Michael
Struble Craig A
Tonellato Peter
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Different classes of haplotype block algorithms exist and the ideal dataset to assess their performance would be to comprehensively re-sequence a large genomic region in a large population. Such data sets are expensive to collect. Alternatively, we performed coalescent simulations to generate haplotypes with a high marker density and compared block partitioning results from diversity based, LD based, and information theoretic algorithms under different values of SNP density and allele frequency. RESULTS: We simulated 1000 haplotypes using the standard coalescent for three world populations – European, African American, and East Asian – and applied three classes of block partitioning algorithms – diversity based, LD based, and information theoretic. We assessed algorithm differences in number, size, and coverage of blocks inferred under different conditions of SNP density, allele frequency, and sample size. Each algorithm inferred blocks differing in number, size, and coverage under different density and allele frequency conditions. Different partitions had few if any matching block boundaries. However they still overlapped and a high percentage of total chromosomal region was common to all methods. This percentage was generally higher with a higher density of SNPs and when rarer markers were included. CONCLUSION: A gold standard definition of a haplotype block is difficult to achieve, but collecting haplotypes covered with a high density of SNPs, partitioning them with a variety of block algorithms, and identifying regions common to all methods may be the best way to identify genomic regions that harbor SNP variants that cause disease

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Soil Salinity between 1992 and 2000 in Hungary

Author: KOVÁCS D.
MARTH P.
TÓTH T.
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 01/01/2006
Field of study

Repository of the Academy's Library

Whole genome profiling of spontaneous and chemically induced mutations in Toxoplasma gondii

Author: Benenati Brian
Blader Ira J
Brown Kevin M
Coleman Bradley I
Farrell Andrew
Gubbels Marc-Jan
Marth Gabor T
Publication venue: Digital Commons@Becker
Publication date: 01/01/2014
Field of study

BACKGROUND: Next generation sequencing is helping to overcome limitations in organisms less accessible to classical or reverse genetic methods by facilitating whole genome mutational analysis studies. One traditionally intractable group, the Apicomplexa, contains several important pathogenic protozoan parasites, including the Plasmodium species that cause malaria. Here we apply whole genome analysis methods to the relatively accessible model apicomplexan, Toxoplasma gondii, to optimize forward genetic methods for chemical mutagenesis using N-ethyl-N-nitrosourea (ENU) and ethylmethane sulfonate (EMS) at varying dosages. RESULTS: By comparing three different lab-strains we show that spontaneously generated mutations reflect genome composition, without nucleotide bias. However, the single nucleotide variations (SNVs) are not distributed randomly over the genome; most of these mutations reside either in non-coding sequence or are silent with respect to protein coding. This is in contrast to the random genomic distribution of mutations induced by chemical mutagenesis. Additionally, we report a genome wide transition vs transversion ratio (ti/tv) of 0.91 for spontaneous mutations in Toxoplasma, with a slightly higher rate of 1.20 and 1.06 for variants induced by ENU and EMS respectively. We also show that in the Toxoplasma system, surprisingly, both ENU and EMS have a proclivity for inducing mutations at A/T base pairs (78.6% and 69.6%, respectively). CONCLUSIONS: The number of SNVs between related laboratory strains is relatively low and managed by purifying selection away from changes to amino acid sequence. From an experimental mutagenesis point of view, both ENU (24.7%) and EMS (29.1%) are more likely to generate variation within exons than would naturally accumulate over time in culture (19.1%), demonstrating the utility of these approaches for yielding proportionally greater changes to the amino acid sequence. These results will not only direct the methods of future chemical mutagenesis in Toxoplasma, but also aid in designing forward genetic approaches in less accessible pathogenic protozoa as well. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-354) contains supplementary material, which is available to authorized users

Crossref

Springer - Publisher Connector

Digital Commons@Becker

PubMed Central

Tangram: A comprehensive toolbox for mobile element insertion detection

Author: Batzer Mark A.
Konkel Miriam K.
Lee Wan Ping
Marth Gabor T.
Walker Jerilyn A.
Ward Alistair
Wu Jiantao
Publication venue: LSU Digital Commons
Publication date: 01/01/2014
Field of study

© 2014 Wu et al.; licensee BioMed Central Ltd. Background: Mobile elements (MEs) constitute greater than 50% of the human genome as a result of repeated insertion events during human genome evolution. Although most of these elements are now fixed in the population, some MEs, including ALU, L1, SVA and HERV-K elements, are still actively duplicating. Mobile element insertions (MEIs) have been associated with human genetic disorders, including Crohn\u27s disease, hemophilia, and various types of cancer, motivating the need for accurate MEI detection methods. To comprehensively identify and accurately characterize these variants in whole genome next-generation sequencing (NGS) data, a computationally efficient detection and genotyping method is required. Current computational tools are unable to call MEI polymorphisms with sufficiently high sensitivity and specificity, or call individual genotypes with sufficiently high accuracy.Results: Here we report Tangram, a computationally efficient MEI detection program that integrates read-pair (RP) and split-read (SR) mapping signals to detect MEI events. By utilizing SR mapping in its primary detection module, a feature unique to this software, Tangram is able to pinpoint MEI breakpoints with single-nucleotide precision. To understand the role of MEI events in disease, it is essential to produce accurate individual genotypes in clinical samples. Tangram is able to determine sample genotypes with very high accuracy. Using simulations and experimental datasets, we demonstrate that Tangram has superior sensitivity, specificity, breakpoint resolution and genotyping accuracy, when compared to other, recently developed MEI detection methods.Conclusions: Tangram serves as the primary MEI detection tool in the 1000 Genomes Project, and is implemented as a highly portable, memory-efficient, easy-to-use C++ computer program, built under an open-source development model

Crossref

Springer - Publisher Connector

PubMed Central

LSU Scholarly Repository (Louisiana State Univ.)

The Sequence Alignment/Map format and SAMtools

Author: A. Wysoker
B. Handsaker
G. Abecasis
G. Marth
H. Li
J. Ruan
Langmead
Mardis
N. Homer
R. Durbin
T. Fennell
Publication venue: Oxford University Press
Publication date: 30/01/2013
Field of study

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments

CiteSeerX

Crossref

Harvard University - DASH

PubMed Central

A standard variation file format for human genome sequences

Author: Batchelor Colin
Cunningham Fiona
Eilbeck Karen
Flicek Paul
Marth Gabor T
Moore Barry
Reese Martin G
Salas Fidel
Stein Lincoln
Yandell Mark
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazon's EC2 cloud computing environment

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

Expression divergence measured by transcriptome sequencing of four yeast species

Author: Barnett Derek
Busby Michele A
Chuang Jeffrey H
Costa Allen M
Gray Jesse M
Marth Gabor T
Springer Michael
Stewart Chip
Stromberg Michael P
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The evolution of gene expression is a challenging problem in evolutionary biology, for which accurate, well-calibrated measurements and methods are crucial. Results We quantified gene expression with whole-transcriptome sequencing in four diploid, prototrophic strains of <it>Saccharomyces </it>species grown under the same condition to investigate the evolution of gene expression. We found that variation in expression is gene-dependent with large variations in each gene's expression between replicates of the same species. This confounds the identification of genes differentially expressed across species. To address this, we developed a statistical approach to establish significance bounds for inter-species differential expression in RNA-Seq data based on the variance measured across biological replicates. This metric estimates the combined effects of technical and environmental variance, as well as Poisson sampling noise by isolating each component. Despite a paucity of large expression changes, we found a strong correlation between the variance of gene expression change and species divergence (R2 = 0.90). Conclusion We provide an improved methodology for measuring gene expression changes in evolutionary diverged species using RNA Seq, where experimental artifacts can mimic evolutionary effects. GEO Accession Number: GSE32679</p

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants

Author: Altschul
Barker
Batley
C. Duran
D. Edwards
D. Wood
Huang
J. Batley
M. Imelfort
Marth
N. Appleby
Savage
Syv nen
T. Clark
Taillon-Miller
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Single nucleotide polymorphisms (SNPs) may be considered the ultimate genetic marker as they represent the finest resolution of a DNA sequence (a single nucleotide), are generally abundant in populations and have a low mutation rate. Analysis of assembled EST sequence data provides a cost-effective means to identify large numbers of SNPs associated with functional genes. We have developed an integrated SNP discovery pipeline, which identifies SNPs from assembled EST sequences. The results are maintained in a custom relational database along with EST source and annotation information. The current database hosts data for the important crops rice, barley and Brassica. Users may rapidly identify polymorphic sequences of interest through BLAST sequence comparison, keyword searches of annotations derived from UniRef90 and GenBank comparisons, GO annotations or in genes corresponding to syntenic regions of reference genomes. In addition, SNPs between specific varieties may be identified for targeted mapping and association studies. SNPs are viewed using a user-friendly graphical interface. The database is freely accessible at http://autosnpdb.qfab.org.au/

CiteSeerX

Crossref

PubMed Central

University of Queensland eSpace

The variant call format and VCFtools

Author: A. Auton
C. A. Albers
Durbin
E. Banks
G. Abecasis
G. Lunter
G. McVean
G. T. Marth
M. A. DePristo
P. Danecek
R. Durbin
R. E. Handsaker
S. T. Sherry
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API

Oxford University Research Archive