36 research outputs found

    Competitive enzymatic reaction to control allele-specific extensions

    Get PDF
    Here, we present a novel method for SNP genotyping based on protease-mediated allele-specific primer extension (PrASE), where the two allele-specific extension primers only differ in their 3′-positions. As reported previously [Ahmadian,A., Gharizadeh,B., O'Meara,D., Odeberg,J. and Lundeberg,J. (2001), Nucleic Acids Res., 29, e121], the kinetics of perfectly matched primer extension is faster than mismatched primer extension. In this study, we have utilized this difference in kinetics by adding protease, a protein-degrading enzyme, to discriminate between the extension reactions. The competition between the polymerase activity and the enzymatic degradation yields extension of the perfectly matched primer, while the slower extension of mismatched primer is eliminated. To allow multiplex and simultaneous detection of the investigated single nucleotide polymorphisms (SNPs), each extension primer was given a unique signature tag sequence on its 5′ end, complementary to a tag on a generic array. A multiplex nested PCR with 13 SNPs was performed in a total of 36 individuals and their alleles were scored. To demonstrate the improvements in scoring SNPs by PrASE, we also genotyped the individuals without inclusion of protease in the extension. We conclude that the developed assay is highly allele-specific, with excellent multiplex SNP capabilities

    Arrayed identification of DNA signatures

    No full text
    In this thesis techniques are presented that aim to determine individual DNA signatures by controlled synthesis of nucleic acid multimers. Allele-specific extension reactions with an improved specificity were applied for several genomic purposes. Since DNA polymerases extend some mismatched 3’-end primers, an improved specificity is a concern. This has been possible by exploiting the faster extension of matched primers and applying the enzymes apyrase or Proteinase K. The findings were applied to methods for resequencing and viral and single nucleotide polymorphism (SNP) genotyping. P53 mutation is the most frequent event in human cancers. Here, a model system for resequencing of 15 bps in p53 based on apyrase-mediated allele-specific extension (AMASE) is described, investigated and evaluated (Paper I). A microarray format with fluorescence detection was used. On each array, four oligonucleotides were printed for each base to resequence. Target PCR products were hybridized and an AMASE-reaction performed in situ to distinguish which of the printed oligonucleotides matched the target. The results showed that without the inclusion of apyrase, the resulting sequence was unreadable. The results open the possibilities for developing large-scale resequencing tools. The presence of certain types of human papillomaviruses (HPV) transforms normal cells into cervical cancer cells. Thus, HPV type determination is clinically important. Also, multiple HPV infections are common but difficult to distinguish. Therefore, a genotyping platform based on competitive hybridization and AMASE is described, used on clinical sample material and evaluated by comparison to Sanger DNA sequencing (Papers II and III). A flexible tag-microarray was used for detection and the two levels of discrimination gave a high level of specificity. Easy identification of multiple infections was possible which provides new opportunities to investigate the importance of multiply infected samples. To achieve highly multiplexed allele-specific extension reactions, large numbers of primers will be employed and lead to spurious hybridizations. Papers IV to VI focus on an alternative approach to control oligomerization by using protease mediated allele-specific extension (PrASE). In order to maintain stringency at higher temperatures, Proteinase K, was used instead of apyrase, leading to DNA polymerase degradation and preventing unspecific extensions. An automated assay with tag-array detection for SNP genotyping was established. First PrASE was introduced and characterized (Paper IV), then used for genotyping of 10 SNPs in 442 samples (Paper V). A 99.8 % concordance to pyrosequencing was found. PrASE is a flexible tool for association studies and the results indicate an improved assay conversion rate as compared to plain allele-specific extension. The highly polymorphic melanocortin-1 receptor gene (MC1R) is involved in melanogenesis. Twenty-one MC1R variants were genotyped with PrASE since variants in the gene have been associated to an increased risk of developing melanoma. A pilot study was performed to establish the assay (Paper VI) and subsequently a larger study was executed to investigate allele frequencies in the Swedish population (Paper VII). The case and control groups consisted of 1001 and 721 samples respectively. A two to sevenfold increased risk of developing melanoma was observed for carriers of variants.QC 2010102

    Cluster Flow: A user-friendly bioinformatics workflow tool [version 1; referees: 3 approved]

    No full text
    Pipeline tools are becoming increasingly important within the field of bioinformatics. Using a pipeline manager to manage and run workflows comprised of multiple tools reduces workload and makes analysis results more reproducible. Existing tools require significant work to install and get running, typically needing pipeline scripts to be written from scratch before running any analysis. We present Cluster Flow, a simple and flexible bioinformatics pipeline tool designed to be quick and easy to install. Cluster Flow comes with 40 modules for common NGS processing steps, ready to work out of the box. Pipelines are assembled using these modules with a simple syntax that can be easily modified as required. Core helper functions automate many common NGS procedures, making running pipelines simple. Cluster Flow is available with an GNU GPLv3 license on GitHub. Documentation, examples and an online demo are available at http://clusterflow.io

    Cluster Flow: A user-friendly bioinformatics workflow tool [version 2; referees: 3 approved]

    No full text
    Pipeline tools are becoming increasingly important within the field of bioinformatics. Using a pipeline manager to manage and run workflows comprised of multiple tools reduces workload and makes analysis results more reproducible. Existing tools require significant work to install and get running, typically needing pipeline scripts to be written from scratch before running any analysis. We present Cluster Flow, a simple and flexible bioinformatics pipeline tool designed to be quick and easy to install. Cluster Flow comes with 40 modules for common NGS processing steps, ready to work out of the box. Pipelines are assembled using these modules with a simple syntax that can be easily modified as required. Core helper functions automate many common NGS procedures, making running pipelines simple. Cluster Flow is available with an GNU GPLv3 license on GitHub. Documentation, examples and an online demo are available at http://clusterflow.io

    MultiQC: summarize analysis results for multiple tools and samples in a single report

    No full text
    <h2>Highlights</h2> <h3>Better configs</h3> <p>As of this release, you can now set all of your config variables via environment variables! (see <a href="https://multiqc.info/docs/getting_started/config/#config-with-environment-variables">docs</a>).</p> <p>Better still, YAML config files can now use string interpolation to parse environment variables within strings (see <a href="https://multiqc.info/docs/getting_started/config/#referencing-environment-variables-in-yaml-configs">docs</a>), eg:</p> <pre><code class="language-yaml">report_header_info: - Contact E-mail: !ENV "NAME:info@{NAME:info}@{DOMAIN:example.com}" </code></pre> <h3>Picard refactoring</h3> <p>In this release, there was a significant refactoring of the Picard module. It has been generalized for better code sharing with other Picard-based software, like Sentieon and Parabricks. As a result of this, the standalone Sentieon module was removed: Sentieon QC files will be interpreted directly as Picard QC files.</p> <p>If you were using the Sentieon module in your pipelines, make sure to update any places that reference the module name:</p> <ul> <li>MultiQC command line (e.g. replace <code>--module sentieon</code> with <code>--module picard</code>).</li> <li>MultiQC configs (e.g. replace <code>sentieon</code> with <code>picard</code> in options like <code>run_modules</code>, <code>exclude_modules</code>, <code>module_order</code>).</li> <li>Downstream code that relies on names of the files in <code>multiqc_data</code> or <code>multiqc_plots</code> saves (e.g., <code>multiqc_data/multiqc_sentieon_AlignmentSummaryMetrics.txt</code> becomes <code>multiqc_data/multiqc_picard_AlignmentSummaryMetrics.txt</code>).</li> <li>Code that parses data files like <code>multiqc_data/multiqc_data.json</code>.</li> <li>Custom plugins and templates that rely on HTML anchors (e.g. <code>#sentieon_aligned_reads</code> becomes <code>#picard_AlignmentSummaryMetrics</code>).</li> <li>Also, note that Picard fetches sample names from the commands it finds inside the QC headers (e.g. <code># net.sf.picard.analysis.CollectMultipleMetrics INPUT=Szabo_160930_SN583_0215_AC9H20ACXX.bam ...</code> -> <code>Szabo_160930_SN583_0215_AC9H20ACXX</code>), whereas the removed Sentieon module prioritized the QC file names. To revert to the old Sentieon approach, use the <a href="https://multiqc.info/docs/getting_started/config/#using-log-filenames-as-sample-names"><code>use_filename_as_sample_name</code> config flag</a>.</li> </ul> <h2>MultiQC updates</h2> <ul> <li>Config can be set with environment variables, including env var interpolation (<a href="https://github.com/ewels/MultiQC/pull/2178">#2178</a>)</li> <li>Try find config in <code>~/.config</code> or <code>$XDG_CONFIG_HOME</code> (<a href="https://github.com/ewels/MultiQC/pull/2183">#2183</a>)</li> <li>Better sample name cleaning with pairs of input filenames (<a href="https://github.com/ewels/MultiQC/pull/2181">#2181</a>)</li> <li>Software versions: allow any string as a version tag (<a href="https://github.com/ewels/MultiQC/pull/2166">#2166</a>)</li> <li>Table columns with non-numeric values and now trigger a linting error if <code>scale</code> is set (<a href="https://github.com/ewels/MultiQC/pull/2176">#2176</a>)</li> <li>Stricter config variable typing (<a href="https://github.com/ewels/MultiQC/pull/2178">#2178</a>)</li> <li>Remove <code>position:absolute</code> CSS from table values (<a href="https://github.com/ewels/MultiQC/pull/2169">#2169</a>)</li> <li>Fix column sorting in exported TSV files from a matplotlib linegraph plot (<a href="https://github.com/ewels/MultiQC/pull/2143">#2143</a>)</li> <li>Fix custom anchors for kraken (<a href="https://github.com/ewels/MultiQC/pull/2170">#2170</a>)</li> <li>Fix logging spillover bug (<a href="https://github.com/ewels/MultiQC/pull/2174">#2174</a>)</li> </ul> <h2>New Modules</h2> <ul> <li><a href="https://github.com/seqeralabs/tower-cli"><strong>Seqera Platform CLI</strong></a> (<a href="https://github.com/ewels/MultiQC/pull/2151">#2151</a>)<ul> <li>Seqera Platform CLI reports statistics generated by the Seqera Platform CLI.</li> </ul> </li> <li><a href="https://github.com/data61/gossamer/blob/master/docs/xenome.md"><strong>Xenome</strong></a> (<a href="https://github.com/ewels/MultiQC/pull/1860">#1860</a>)<ul> <li>A tool for classifying reads from xenograft sources.</li> </ul> </li> <li><a href="https://gitlab.com/genomeinformatics/xengsort"><strong>xengsort</strong></a> (<a href="https://github.com/ewels/MultiQC/pull/2168">#2168</a>)<ul> <li>xengsort is a fast xenograft read sorter based on space-efficient k-mer hashing</li> </ul> </li> </ul> <h2>Module updates</h2> <ul> <li><strong>fastp</strong>: add version parsing (<a href="https://github.com/ewels/MultiQC/pull/2159">#2159</a>)</li> <li><strong>fastp</strong>: correctly parse sample name from <code>--in1</code>/<code>--in2</code> in bash command. Prefer file name if not <code>fastp.json</code>; fallback to file name when error (<a href="https://github.com/ewels/MultiQC/pull/2139">#2139</a>)</li> <li><strong>Kaiju</strong>: fix <code>division by zero</code> error (<a href="https://github.com/ewels/MultiQC/pull/2179">#2179</a>)</li> <li><strong>Nanostat</strong>: account for both tab and spaces in <code>v1.41+</code> search pattern (<a href="https://github.com/ewels/MultiQC/pull/2155">#2155</a>)</li> <li><strong>Pangolin</strong>: update for v4: add QC Note , update tool versions columns (<a href="https://github.com/ewels/MultiQC/pull/2157">#2157</a>)</li> <li><strong>Picard</strong>: Generalize to directly support Sentieon and Parabricks outputs (<a href="https://github.com/ewels/MultiQC/pull/2110">#2110</a>)</li> <li><strong>Sentieon</strong>: Removed the module in favour of directly supporting parsing by the <strong>Picard</strong> module (<a href="https://github.com/ewels/MultiQC/pull/2110">#2110</a>)<ul> <li>Note that any code that relies on the module name needs to be updated, e.g. <code>-m sentieon</code> will no longer work</li> <li>The exported plot and data files will be now be prefixed as <code>picard</code> instead of <code>sentieon</code>, etc.</li> <li>Note that the Sentieon module used to fetch the sample names from the file names by default, and now it follows the Picard module's logic, and prioritizes the commands recorded in the logs. To override, use the <code>use_filename_as_sample_name</code> config flag</li> </ul> </li> </ul>Please consider citing MultiQC if you use it in your analysis

    Fast, accurate, and lightweight analysis of BS-treated reads with ERNE 2

    Get PDF
    Background: Bisulfite treatment of DNA followed by sequencing (BS-seq) has become a standard technique in epigenetic studies, providing researchers with tools for generating single-base resolution maps of whole methylomes. Aligning bisulfite-treated reads, however, is a computationally difficult task: bisulfite treatment decreases the (lexical) complexity of low-methylated genomic regions, and C-to-T mismatches may reflect cytosine unmethylation rather than SNPs or sequencing errors. Further challenges arise both during and after the alignment phase: data structures used by the aligner should be fast and should fit into main memory, and the methylation-caller output should be somehow compressed, due to its significant size. Methods: As far as data structures employed to align bisulfite-treated reads are concerned, solutions proposed in the literature can be roughly grouped into two main categories: those storing pointers at each text position (e.g. hash tables, suffix trees/arrays), and those using the information-theoretic minimum number of bits (e.g. FM indexes and compressed suffix arrays). The former are fast and memory consuming. The latter are much slower and light. In this paper, we try to close this gap proposing a data structure for aligning bisulfite-treated reads which is at the same time fast, light, and very accurate. We reach this objective by combining a recent theoretical result on succinct hashing with a bisulfite-aware hash function. Furthermore, the new versions of the tools implementing our ideas|the aligner ERNE-BS5 2 and the caller ERNE-METH 2|have been extended with increased downstream compatibility (EPP/Bismark cov output formats), output compression, and support for target enrichment protocols. Results: Experimental results on public and simulated WGBS libraries show that our algorithmic solution is a competitive tradeoff between hash-based and BWT-based indexes, being as fast and accurate as the former, and as memory-efficient as the latter. Conclusions: The new functionalities of our bisulfite aligner and caller make it a fast and memory efficient tool, useful to analyze big datasets with little computational resources, to easily process target enrichment data, and produce statistics such as protocol efficiency and coverage as a function of the distance from target regions

    Comprehensive haplotyping of the HLA gene family using nanopore sequencing

    No full text
    The HLA gene family is the most polymorphic loci in the human genome; it encodes for the major histocompatibility complexes (MHC) which mediates the immune response in terms of cellular interactions with antigens. Compatibility between HLA alleles is thus of great medical interest for recipients of allogeneic transplantations. Traditional serological techniques to evaluate compatibility are now being replaced by more accurate DNA sequencing-based methods. However, short read sequencing data typically result in collapsed sequences representing a mixture of variants from native haplotypes. In addition, most previous studies have been limited to a few highly polymorphic exons of various HLA genes. Here we present haplotype-resolved full-length sequencing of the six most clinically relevant MHC Class I and Class II genes, to characterize the haplotypes of eight reference individuals, using a single MinION flow cell. The results show that full-length sequencing of single molecules enables haplotypes to be resolved to the highest degree of accuracy (four-field resolution). In this study, a majority of the alleles were classified with four-field resolution and could be verified through previously published genotyping studies. These results support the notion that nanopore sequencing could be a viable solution for highly accurate clinical evaluation of histocompatibility.QC 20180919</p
    corecore