11 research outputs found

    Sensitive Long-Indel-Aware Alignment of Sequencing Reads

    Full text link
    The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read aligner with special emphasis on the following properties: (1) high sensitivity, i.e. find all (reasonable) alignments; (2) ability to find (long) indels; (3) statistically sound alignment scores; and (4) runtime fast enough to be applied to whole genome data. We compare performance to BWA, bowtie2, stampy and find that our methods is especially advantageous on reads containing larger indels

    Next Generation Cluster Editing

    Get PDF
    This work aims at improving the quality of structural variant prediction from the mapped reads of a sequenced genome. We suggest a new model based on cluster editing in weighted graphs and introduce a new heuristic algorithm that allows to solve this problem quickly and with a good approximation on the huge graphs that arise from biological datasets

    CLEVER: Clique-Enumerating Variant Finder

    Full text link
    Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, including also concordant reads into a read alignment graph where max-cliques represent maximal contradiction-free groups of alignments. A specifically engineered algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions (indels). For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present various relevant performance statistics. We achieve superior performance rates in particular on indels of sizes 20--100, which have been exposed as a current major challenge in the SV discovery literature and where prior insert size based approaches have limitations. In that size range, we outperform even split read aligners. We achieve good results also on real data where we make a substantial amount of correct predictions as the only tool, which complement the predictions of split-read aligners. CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com.Comment: 30 pages, 8 figure

    An Exome-seq Based Tool for Mapping and Selection of Candidate Genes in Maize Deletion Mutants

    Get PDF
    Despite the large number of genomic and transcriptomic resources in maize, there is still much to learn about the function of genes in developmental and biochemical processes. Some maize mutants that were generated by gamma-irradiation showed clear segregation for the kernel phenotypes in B73 X Mo17 F2 ears. To better understand the functional genomics of kernel development, we developed a mapping and gene identification pipeline, bulked segregant exome sequencing (BSEx-seq), to map mutants with kernel phenotypes including opaque endosperm and reduced kernel size. BSEx-seq generates and compares the sequence of the exon fraction from mutant and normal plant F2 DNA pools. The comparison can derive mapping peaks, identify deletions within the mapping peak, and suggest candidate genes within the deleted regions. We then used the public kernel-specific expression data to narrow down the list of candidate genes/mutations and identified deletions ranging from several kb to more than 1 Mb. A full deletion allele of the Opaque-2 gene was identified in mutant 531, which occurs within a ~200-kb deletion. Opaque mutant 1486 has a 6248-bp deletion in the mapping interval containing two candidate genes encoding RNA-directed DNA methylation 4 (RdDM4) and AMP-binding protein, respectively. This study demonstrates the efficiency and cost-effectiveness of BSEx-seq for causal mutation mapping and candidate gene selection, providing a new option in mapping-by-sequencing for maize functional genomics studies

    Next generation cluster editing

    Get PDF

    CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks

    Get PDF

    Detection of genome-wide structural variations in the Shanghai Holstein cattle population using next-generation sequencing

    Get PDF
    Objective The Shanghai Holstein cattle breed is susceptible to severe mastitis and other diseases due to the hot weather and long-term humidity in Shanghai, which is the main distribution centre for providing Holstein semen to various farms throughout China. Our objective was to determine the genetic mechanisms influencing economically important traits, especially diseases that have huge impact on the yield and quality of milk as well as reproduction. Methods In our study, we detected the structural variations of 1,092 Shanghai Holstein cows by using next-generation sequencing. We used the DELLY software to identify deletions and insertions, cn.MOPS to identify copy-number variants (CNVs). Furthermore, we annotated these structural variations using different bioinformatics tools, such as gene ontology, cattle quantitative trait locus (QTL) database and ingenuity pathway analysis (IPA). Results The average number of high-quality reads was 3,046,279. After filtering, a total of 16,831 deletions, 12,735 insertions and 490 CNVs were identified. The annotation results showed that these mapped genes were significantly enriched for specific biological functions, such as disease and reproduction. In addition, the enrichment results based on the cattle QTL database showed that the number of variants related to milk and reproduction was higher than the number of variants related to other traits. IPA core analysis found that the structural variations were related to reproduction, lipid metabolism, and inflammation. According to the functional analysis, structural variations were important factors affecting the variation of different traits in Shanghai Holstein cattle. Our results provide meaningful information about structural variations, which may be useful in future assessments of the associations between variations and important phenotypes in Shanghai Holstein cattle. Conclusion Structural variations identified in this study were extremely different from those of previous studies. Many structural variations were found to be associated with mastitis and reproductive system diseases; these results are in accordance with the characteristics of the environment that Shanghai Holstein cattle experience

    Chasing the Genetics of Ascites in Broilers using Whole Genome Resequencing

    Get PDF
    We are using whole genome resequencing to identify chromosomal regions associated with resistance or susceptibility to ascites, a form of pulmonary hypertension syndrome, meat-type chickens. Previous Genome Wide Association Studies (GWAS) based on Single Nucleotide Polymorphisms (SNPs) have identified regions on chromosomes 2, 9 and Z. Despite several GWAS and further genotyping, there are no reliable or potential markers for ascites phenotype. We have completed screening of Copy Number Variations (CNVs) and Single Nucleotide Polymorphisms in ascites resistant and susceptible birds from the relaxed, REL, line derived from a commercial elite broiler line. DNA samples from resistant and susceptible birds were purified, quantified and pooled in two pools of 10 DNAs from each phenotype for both genders. Eight pools (2 pools x 2 phenotypes x 2 genders) were generated. Each pool was submitted for bar-coded library generation, and 2x125 paired end reads on Illumina HiSeq 2500 and with 66X genome coverage. The sequence reads were mapped onto Galgal5 using Bowtie for initial CNV mapping cn.mops (R package). Further mapping to chromosomes were done using NGen and ArrayStar (DNAStar ver 13). So far, we have identified two potential regions for CNVs and 31 regions for SNPs with potential association with ascites phenotype. CPQ gene on chromosome 2 and LRRTM4 gene on chromosome 22 have been validated for containing ascites QTLs. However, their exact role in ascites is yet to be discovered. Further, we screened the regions from REL line in DNAs from an unrelated commercial broiler line using WGR
    corecore