719 research outputs found

    HLA predictions from long sequence read alignments, streamed directly into HLAminer

    Full text link
    The rapidly changing landscape of sequencing technologies brings new opportunities to genomics research. Longer sequence reads and higher sequence throughput coupled with ever-improving base accuracy and decreasing per-base cost is now making long reads suitable for analyzing polymorphic regions of the human genome, such as those of the human leucocyte antigen (HLA) gene complex. Here I present a simple protocol for predicting HLA signatures from whole genome shotgun (WGS) long sequencing reads, by directly streaming sequence alignments into HLAminer. The method is as simple as running minimap2, it scales with the number of sequences to align, and can be used with any read aligner capable of sam format output without the need to store bulky alignment files to disk. I show how the predictions are robust even with older and less [base] accurate WGS nanopore datasets and relatively low (10X) sequence coverage and present a step-by-step protocol to predict HLA class I and II genes from the long sequencing reads of modern third-generation technologies.Comment: 4 pages, 3 table

    Targeted Assembly of Short Sequence Reads

    Get PDF
    As next-generation sequence (NGS) production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants, by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled strin-gently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming ge-nomic mutations, polymorphism, fusion and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly

    ntLink: a toolkit for de novo genome assembly scaffolding and mapping using long reads

    Full text link
    With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based mappings to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based mappings that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink.Comment: 23 pages, 2 figure

    Пористые ковалентные орагнические полимеры, используемые в люминисцентных методах анализа

    Get PDF
    В последнее время химическая промышленность развивается колоссальными темпами,вследствие чего активно растёт объём применяемых химических продуктов, которые в свою очередь приводят к загрязнению почвы, водных биологических систем и окружающей среды. Для контроля качества окружающей среды используются различные методы анализа, мы решили рассмотреть один из наиболее быстрых и чувствительных методов, люминесцентный. Поэтому мы решили получить пять различных образцов пористых ковалентных веществ, которые могут быть использованы, как анализаторы при люминесцентном методе

    Activation of an Endogenous Retrovirus-Associated Long Non-Coding RNA in Human Adenocarcinoma

    Get PDF
    Background Long non-coding RNAs (lncRNAs) are emerging as molecules that significantly impact many cellular processes and have been associated with almost every human cancer. Compared to protein-coding genes, lncRNA genes are often associated with transposable elements, particularly with endogenous retroviral elements (ERVs). ERVs can have potentially deleterious effects on genome structure and function, so these elements are typically silenced in normal somatic tissues, albeit with varying efficiency. The aberrant regulation of ERVs associated with lncRNAs (ERV-lncRNAs), coupled with the diverse range of lncRNA functions, creates significant potential for ERV-lncRNAs to impact cancer biology. Methods We used RNA-seq analysis to identify and profile the expression of a novel lncRNA in six large cohorts, including over 7,500 samples from The Cancer Genome Atlas (TCGA). Results We identified the tumor-specific expression of a novel lncRNA that we have named Endogenous retroViral-associated ADenocarcinoma RNA or ‘EVADR’, by analyzing RNA-seq data derived from colorectal tumors and matched normal control tissues. Subsequent analysis of TCGA RNA-seq data revealed the striking association of EVADR with adenocarcinomas, which are tumors of glandular origin. Moderate to high levels of EVADR were detected in 25 to 53% of colon, rectal, lung, pancreas and stomach adenocarcinomas (mean = 30 to 144 FPKM), and EVADR expression correlated with decreased patient survival (Cox regression; hazard ratio = 1.47, 95% confidence interval = 1.06 to 2.04, P = 0.02). In tumor sites of non-glandular origin, EVADR expression was detectable at only very low levels and in less than 10% of patients. For EVADR, a MER48 ERV element provides an active promoter to drive its transcription. Genome-wide, MER48 insertions are associated with nine lncRNAs, but none of the MER48-associated lncRNAs other than EVADR were consistently expressed in adenocarcinomas, demonstrating the specific activation of EVADR. The sequence and structure of the EVADR locus is highly conserved among Old World monkeys and apes but not New World monkeys or prosimians, where the MER48 insertion is absent. Conservation of the EVADR locus suggests a functional role for this novel lncRNA in humans and our closest primate relatives. Conclusions Our results describe the specific activation of a highly conserved ERV-lncRNA in numerous cancers of glandular origin, a finding with diagnostic, prognostic and therapeutic implications

    The Sensitivity of Massively Parallel Sequencing for Detecting Candidate Infectious Agents Associated with Human Tissue

    Get PDF
    Massively parallel sequencing technology now provides the opportunity to sample the transcriptome of a given tissue comprehensively. Transcripts at only a few copies per cell are readily detectable, allowing the discovery of low abundance viral and bacterial transcripts in human tissue samples. Here we describe an approach for mining large sequence data sets for the presence of microbial sequences. Further, we demonstrate the sensitivity of this approach by sequencing human RNA-seq libraries spiked with decreasing amounts of an RNA-virus. At a modest depth of sequencing, viral transcripts can be detected at frequencies less than 1 in 1,000,000. With current sequencing platforms approaching outputs of one billion reads per run, this is a highly sensitive method for detecting putative infectious agents associated with human tissues

    The cumate gene-switch: a system for regulated expression in mammalian cells

    Get PDF
    BACKGROUND: A number of expression systems have been developed where transgene expression can be regulated. They all have specific characteristics making them more suitable for certain applications than for others. Since some applications require the regulation of several genes, there is a need for a variety of independent yet compatible systems. RESULTS: We have used the regulatory mechanisms of bacterial operons (cmt and cym) to regulate gene expression in mammalian cells using three different strategies. In the repressor configuration, regulation is mediated by the binding of the repressor (CymR) to the operator site (CuO), placed downstream of a strong constitutive promoter. Addition of cumate, a small molecule, relieves the repression. In the transactivator configuration, a chimaeric transactivator (cTA) protein, formed by the fusion of CymR with the activation domain of VP16, is able to activate transcription when bound to multiple copies of CuO, placed upstream of the CMV minimal promoter. Cumate addition abrogates DNA binding and therefore transactivation by cTA. Finally, an adenoviral library of cTA mutants was screened to identify a reverse cumate activator (rcTA), which activates transcription in the presence rather than the absence of cumate. CONCLUSION: We report the generation of a new versatile inducible expression system

    Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach

    Get PDF
    BACKGROUND: High throughput sequencing-by-synthesis is an emerging technology that allows the rapid production of millions of bases of data. Although the sequence reads are short, they can readily be used for re-sequencing. By re-sequencing the mRNA products of a cell, one may rapidly discover polymorphisms and splice variants particular to that cell. RESULTS: We present the utility of massively parallel sequencing by synthesis for profiling the transcriptome of a human prostate cancer cell-line, LNCaP, that has been treated with the synthetic androgen, R1881. Through the generation of approximately 20 megabases (MB) of EST data, we detect transcription from over 10,000 gene loci, 25 previously undescribed alternative splicing events involving known exons, and over 1,500 high quality single nucleotide discrepancies with the reference human sequence. Further, we map nearly 10,000 ESTs to positions on the genome where no transcription is currently predicted to occur. We also characterize various obstacles with using sequencing by synthesis for transcriptome analysis and propose solutions to these problems. CONCLUSION: The use of high-throughput sequencing-by-synthesis methods for transcript profiling allows the specific and sensitive detection of many of a cell's transcripts, and also allows the discovery of high quality base discrepancies, and alternative splice variants. Thus, this technology may provide an effective means of understanding various disease states, discovering novel targets for disease treatment, and discovery of novel transcripts
    corecore