41 research outputs found

    mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications

    Get PDF
    Cataloged from PDF version of article.High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net

    Molecular characterization and similarity relationships among sunflower (Helianthus annuus L.) inbred lines using some mapped simple sequence repeats

    Get PDF
    Information about the genetic diversity and relationships among breeding lines and varieties is not only useful for germplasm conservation and inbred line identification, but also for the selection of parental lines for quantitative trait loci (QTL) mapping as well as hybrid breeding in crops, including sunflower. In order to develop mapping populations, genetic distances among twenty eight sunflower genotypes were evaluated using simple sequence repeat (SSR) markers. One hundred and two markers were generated by 38 SSR loci and the mean for the number of allele per locus was 2.32. Polymorphism information content (PIC) values ranged from 0.09 (locus ha3555) to 0.62 (locus ORS598) with an average of 0.41. Jaccard's coefficient similarity matrix for the studied sunflower  genotypes varied from 0.25 to 0.9 indicating a broad genetic base. The maximum similarity (0.9) was observed between genotypes RT931 and ENSAT-R5, while the lowest similarity (0.25) was between genotypes LC1064C and LR64. Based on unweighted pair group method with arithmetic mean (UPGMA) clustering algorithm, the studied genotypes were clustered in four groups. However, some genotypes have the same specific characters that influence their clustering, and as a result, the results of the principal coordinate analysis (PCoA) largely corresponded to those obtained through cluster analysis.Key words: Cluster analysis, genetic diversity, principal coordinate analysis, sunflower, simple sequence repeat

    Pan-cancer analysis of whole genomes

    Get PDF
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    Induction of embryo development and fixation of partial interspecific lines after pollination of F-1 cotton interspecific hybrids (Gossypium barbadense x Gossypium hirsutum) with pollen from Hibiscus cannabinus

    No full text
    The possibility of inducing embryo development after pollination of F-1 interspecific cotton hybrids ( Gossypium barbadense x Gossypium hirsutum) and their reciprocals with pollen from Hibiscus cannabinus was investigated. For this, flowers of F-1 plants from 4 G. barbadense x G. hirsutum interspeci. c hybrids (B403 x Acala Sindos, Carnak x 4S, B403 x Coker 310, and Carnak x Acala Sindos) and their reciprocals grown in the field were pollinated with pollen from Hibiscus cannabinus. From the 443 pollinated flowers, 276 were left on the plant to grow naturally, and 167 were collected 5 days after pollination. Young ovules from the collected buds were cultured in vitro for embryo development. It was observed that, from the buds left to grow naturally on the mother plant, 21 bolls reached maturity. The mature bolls originated only from the 4 G. barbadense x G. hirsutum hybrids and contained 82 mature seeds. Finally, 38 plants (Pa0) were produced. From the in-ovule culture method, 10 young embryos were isolated from both G. barbadense x G. hirsutum and G. hirsutum x G. barbadense hybrids and finally 3 plants were produced. The plants produced from both approaches originated only from the G. barbadense x G. hirsutum hybrids. These plants exhibited morphological traits from both cotton species and they were partially fertile. No signs of H. cannabinus morphological traits were observed in the plants produced. Root-tip chromosome counts revealed that chromosome number among cells of the Pa0 plants ranged from 27 to 42 and the difference in chromosome number observed among cells of the same plant ranged from 1 to 3. The chromosome number, however, was increased progressively from generation to generation and in Pa3 it ranged from 46 to 52. Plants with 52 chromosomes were identified even from the Pa1 generation. In addition, flow cytometric analysis indicated that the parental plants had a similar DNA profile to the F-1 and F-2 interspecific hybrids but a different one from the Pa0 plants. Thus, alien pollination of cotton flowers from interspeci. c ( G. barbadense x G. hirsutum and reciprocals) hybrids with pollen from H. cannabinus most likely induced parthenogenetic (Pa) egg cell development which, after a progressive chromosome increase, produced fully fertile plants with most of the cells at the tetraploid or near-tetraploid level. It was concluded that a combination of the in situ boll development with an optimised in vitro ovule culture technique could establish the 'cannabinus method' in cotton, as a method for the production of genotype-independent partial interspecific lines
    corecore