8,060 research outputs found

    The Sequence Alignment/Map format and SAMtools

    Get PDF
    Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments

    chloroExtractor: extraction and assembly of the chloroplast genome from whole genome shotgun data

    Get PDF
    The chloroExtractor is a perl based program which provides a pipeline for DNA extraction of chloroplast DNA from whole genome plant data.The authors MJA and SP contributed equally to this work. MJA was supported by a grant of the German Excellence Initiative to the Graduate School of Life Sciences, University of WĂŒrzburg

    RNF: a general framework to evaluate NGS read mappers

    Get PDF
    Aligning reads to a reference sequence is a fundamental step in numerous bioinformatics pipelines. As a consequence, the sensitivity and precision of the mapping tool, applied with certain parameters to certain data, can critically affect the accuracy of produced results (e.g., in variant calling applications). Therefore, there has been an increasing demand of methods for comparing mappers and for measuring effects of their parameters. Read simulators combined with alignment evaluation tools provide the most straightforward way to evaluate and compare mappers. Simulation of reads is accompanied by information about their positions in the source genome. This information is then used to evaluate alignments produced by the mapper. Finally, reports containing statistics of successful read alignments are created. In default of standards for encoding read origins, every evaluation tool has to be made explicitly compatible with the simulator used to generate reads. In order to solve this obstacle, we have created a generic format RNF (Read Naming Format) for assigning read names with encoded information about original positions. Futhermore, we have developed an associated software package RNF containing two principal components. MIShmash applies one of popular read simulating tools (among DwgSim, Art, Mason, CuReSim etc.) and transforms the generated reads into RNF format. LAVEnder evaluates then a given read mapper using simulated reads in RNF format. A special attention is payed to mapping qualities that serve for parametrization of ROC curves, and to evaluation of the effect of read sample contamination

    Discovery of a large set of SNP and SSR genetic markers by high-throughput sequencing of pepper (Capsicum annuum)

    Get PDF
    Genetic markers based on single nucleotide polymorphisms (SNPs) are in increasing demand for genome mapping and fingerprinting of breeding populations in crop plants. Recent advances in high-throughput sequencing provide the opportunity for whole-genome resequencing and identification of allelic variants by mapping the reads to a reference genome. However, for many species, such as pepper (Capsicum annuum), a reference genome sequence is not yet available. To this end, we sequenced the C. annuum cv. "Yolo Wonder" transcriptome using Roche 454 pyrosequencing and assembled de novo 23,748 isotigs and 60,370 singletons. Mapping of 10,886,425 reads obtained by the Illumina GA II sequencing of C. annuum cv. "Criollo de Morclos 334" to the "Yolo Wonder" transcriptome allowed for SNP identification. By setting a threshold value that allows selecting reliable SNPs with minimal loss of information, 11,849 reliable SNPs spread across 5919 isotigs were identified. In addition, 853 single sequence repeats were obtained. This information has been made available online

    Systematic Analysis of Whole Exome Sequencing Determines RET G691S Polymorphism as Germline Variant in Melanoma

    Get PDF
    Abstract The RET proto-oncogene encodes a receptor tyrosine kinase that is activated by glial cell derived neutrotrophic factor (GDNF). Previous studies have found that a single nucleotide polymorphism (SNP), RETp (G691S), in the juxtamembrane domain enhances the signaling pathway and promotes tumor growth by GDNF in pancreatic and thyroid cancer in addition to melanoma. It is uncertain however whether this SNP is a germline variant or somatic mutation. A prior study reported that the RETp variant was a germline SNP in desmoplastic and non-desmoplastic melanomas. In the present study, we examined both melanoma tissue samples and matching peripheral blood DNA to determine if RETp was 1) a germline or somatic variant, 2) more frequent in certain melanoma subtypes, and 3) frequency in brain metastasis. We examined the peripheral blood of 197 melanoma patients whom had at least one matched tumor, and 42 patients with brain metastasis. RETp was present as a germline SNP in 33% of patients. There were no significant differences in RETp frequency among the different melanoma subtypes, and RETp was not correlated with brain metastasis

    Beyond homozygosity mapping: family-control analysis based on Hamming distance for prioritizing variants in exome sequencing

    Get PDF
    A major challenge in current exome sequencing in autosomal recessive (AR) families is the lack of an effective method to prioritize single-nucleotide variants (SNVs). AR families are generally too small for linkage analysis, and length of homozygous regions is unreliable for identification of causative variants. Various common filtering steps usually result in a list of candidate variants that cannot be narrowed down further or ranked. To prioritize shortlisted SNVs we consider each homozygous candidate variant together with a set of SNVs flanking it. We compare the resulting array of genotypes between an affected family member and a number of control individuals and argue that, in a family, differences between family member and controls should be larger for a pathogenic variant and SNVs flanking it than for a random variant. We assess differences between arrays in two individuals by the Hamming distance and develop a suitable test statistic, which is expected to be large for a causative variant and flanking SNVs. We prioritize candidate variants based on this statistic and applied our approach to six patients with known pathogenic variants and found these to be in the top 2 to 10 percentiles of ranks
    • 

    corecore