481 research outputs found

    HLA predictions from long sequence read alignments, streamed directly into HLAminer

    Full text link
    The rapidly changing landscape of sequencing technologies brings new opportunities to genomics research. Longer sequence reads and higher sequence throughput coupled with ever-improving base accuracy and decreasing per-base cost is now making long reads suitable for analyzing polymorphic regions of the human genome, such as those of the human leucocyte antigen (HLA) gene complex. Here I present a simple protocol for predicting HLA signatures from whole genome shotgun (WGS) long sequencing reads, by directly streaming sequence alignments into HLAminer. The method is as simple as running minimap2, it scales with the number of sequences to align, and can be used with any read aligner capable of sam format output without the need to store bulky alignment files to disk. I show how the predictions are robust even with older and less [base] accurate WGS nanopore datasets and relatively low (10X) sequence coverage and present a step-by-step protocol to predict HLA class I and II genes from the long sequencing reads of modern third-generation technologies.Comment: 4 pages, 3 table

    Targeted Assembly of Short Sequence Reads

    Get PDF
    As next-generation sequence (NGS) production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants, by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled strin-gently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming ge-nomic mutations, polymorphism, fusion and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly

    Activation of an Endogenous Retrovirus-Associated Long Non-Coding RNA in Human Adenocarcinoma

    Get PDF
    Background Long non-coding RNAs (lncRNAs) are emerging as molecules that significantly impact many cellular processes and have been associated with almost every human cancer. Compared to protein-coding genes, lncRNA genes are often associated with transposable elements, particularly with endogenous retroviral elements (ERVs). ERVs can have potentially deleterious effects on genome structure and function, so these elements are typically silenced in normal somatic tissues, albeit with varying efficiency. The aberrant regulation of ERVs associated with lncRNAs (ERV-lncRNAs), coupled with the diverse range of lncRNA functions, creates significant potential for ERV-lncRNAs to impact cancer biology. Methods We used RNA-seq analysis to identify and profile the expression of a novel lncRNA in six large cohorts, including over 7,500 samples from The Cancer Genome Atlas (TCGA). Results We identified the tumor-specific expression of a novel lncRNA that we have named Endogenous retroViral-associated ADenocarcinoma RNA or ‘EVADR’, by analyzing RNA-seq data derived from colorectal tumors and matched normal control tissues. Subsequent analysis of TCGA RNA-seq data revealed the striking association of EVADR with adenocarcinomas, which are tumors of glandular origin. Moderate to high levels of EVADR were detected in 25 to 53% of colon, rectal, lung, pancreas and stomach adenocarcinomas (mean = 30 to 144 FPKM), and EVADR expression correlated with decreased patient survival (Cox regression; hazard ratio = 1.47, 95% confidence interval = 1.06 to 2.04, P = 0.02). In tumor sites of non-glandular origin, EVADR expression was detectable at only very low levels and in less than 10% of patients. For EVADR, a MER48 ERV element provides an active promoter to drive its transcription. Genome-wide, MER48 insertions are associated with nine lncRNAs, but none of the MER48-associated lncRNAs other than EVADR were consistently expressed in adenocarcinomas, demonstrating the specific activation of EVADR. The sequence and structure of the EVADR locus is highly conserved among Old World monkeys and apes but not New World monkeys or prosimians, where the MER48 insertion is absent. Conservation of the EVADR locus suggests a functional role for this novel lncRNA in humans and our closest primate relatives. Conclusions Our results describe the specific activation of a highly conserved ERV-lncRNA in numerous cancers of glandular origin, a finding with diagnostic, prognostic and therapeutic implications

    Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach

    Get PDF
    BACKGROUND: High throughput sequencing-by-synthesis is an emerging technology that allows the rapid production of millions of bases of data. Although the sequence reads are short, they can readily be used for re-sequencing. By re-sequencing the mRNA products of a cell, one may rapidly discover polymorphisms and splice variants particular to that cell. RESULTS: We present the utility of massively parallel sequencing by synthesis for profiling the transcriptome of a human prostate cancer cell-line, LNCaP, that has been treated with the synthetic androgen, R1881. Through the generation of approximately 20 megabases (MB) of EST data, we detect transcription from over 10,000 gene loci, 25 previously undescribed alternative splicing events involving known exons, and over 1,500 high quality single nucleotide discrepancies with the reference human sequence. Further, we map nearly 10,000 ESTs to positions on the genome where no transcription is currently predicted to occur. We also characterize various obstacles with using sequencing by synthesis for transcriptome analysis and propose solutions to these problems. CONCLUSION: The use of high-throughput sequencing-by-synthesis methods for transcript profiling allows the specific and sensitive detection of many of a cell's transcripts, and also allows the discovery of high quality base discrepancies, and alternative splice variants. Thus, this technology may provide an effective means of understanding various disease states, discovering novel targets for disease treatment, and discovery of novel transcripts

    3D genomics across the tree of life reveals condensin II as a determinant of architecture type

    Get PDF
    We investigated genome folding across the eukaryotic tree of life. We find two types of three-dimensional(3D) genome architectures at the chromosome scale. Each type appears and disappears repeatedlyduring eukaryotic evolution. The type of genome architecture that an organism exhibits correlates with theabsence of condensin II subunits. Moreover, condensin II depletion converts the architecture of thehuman genome to a state resembling that seen in organisms such as fungi or mosquitoes. In this state,centromeres cluster together at nucleoli, and heterochromatin domains merge. We propose a physicalmodel in which lengthwise compaction of chromosomes by condensin II during mitosis determineschromosome-scale genome architecture, with effects that are retained during the subsequent interphase.This mechanism likely has been conserved since the last common ancestor of all eukaryotes.C.H. is supported by the Boehringer Ingelheim Fonds; C.H., Á.S.C., and B.D.R. are supported by an ERC CoG (772471, “CohesinLooping”); A.M.O.E. and B.D.R. are supported by the Dutch Research Council (NWO-Echo); and J.A.R. and R.H.M. are supported by the Dutch Cancer Society (KWF). T.v.S. and B.v.S. are supported by NIH Common Fund “4D Nucleome” Program grant U54DK107965. H.T. and E.d.W. are supported by an ERC StG (637597, “HAP-PHEN”). J.A.R., T.v.S., H.T., R.H.M., B.v.S., and E.d.W. are part of the Oncode Institute, which is partly financed by the Dutch Cancer Society. Work at the Center for Theoretical Biological Physics is sponsored by the NSF (grants PHY-2019745 and CHE-1614101) and by the Welch Foundation (grant C-1792). V.G.C. is funded by FAPESP (São Paulo State Research Foundation and Higher Education Personnel) grants 2016/13998-8 and 2017/09662-7. J.N.O. is a CPRIT Scholar in Cancer Research. E.L.A. was supported by an NSF Physics Frontiers Center Award (PHY-2019745), the Welch Foundation (Q-1866), a USDA Agriculture and Food Research Initiative grant (2017-05741), the Behavioral Plasticity Research Institute (NSF DBI-2021795), and an NIH Encyclopedia of DNA Elements Mapping Center Award (UM1HG009375). Hi-C data for the 24 species were created by the DNA Zoo Consortium (www.dnazoo.org). DNA Zoo is supported by Illumina, Inc.; IBM; and the Pawsey Supercomputing Center. P.K. is supported by the University of Western Australia. L.L.M. was supported by NIH (1R01NS114491) and NSF awards (1557923, 1548121, and 1645219) and the Human Frontiers Science Program (RGP0060/2017). The draft A. californica project was supported by NHGRI. J.L.G.-S. received funding from the ERC (grant agreement no. 740041), the Spanish Ministerio de Economía y Competitividad (grant no. BFU2016-74961-P), and the institutional grant Unidad de Excelencia María de Maeztu (MDM-2016-0687). R.D.K. is supported by NIH grant RO1DK121366. V.H. is supported by NIH grant NIH1P41HD071837. K.M. is supported by a MEXT grant (20H05936). M.C.W. is supported by the NIH grants R01AG045183, R01AT009050, R01AG062257, and DP1DK113644 and by the Welch Foundation. E.F. was supported by NHGR

    A Giant Planet Candidate Transiting a White Dwarf

    Full text link
    Astronomers have discovered thousands of planets outside the solar system, most of which orbit stars that will eventually evolve into red giants and then into white dwarfs. During the red giant phase, any close-orbiting planets will be engulfed by the star, but more distant planets can survive this phase and remain in orbit around the white dwarf. Some white dwarfs show evidence for rocky material floating in their atmospheres, in warm debris disks, or orbiting very closely, which has been interpreted as the debris of rocky planets that were scattered inward and tidally disrupted. Recently, the discovery of a gaseous debris disk with a composition similar to ice giant planets demonstrated that massive planets might also find their way into tight orbits around white dwarfs, but it is unclear whether the planets can survive the journey. So far, the detection of intact planets in close orbits around white dwarfs has remained elusive. Here, we report the discovery of a giant planet candidate transiting the white dwarf WD 1856+534 (TIC 267574918) every 1.4 days. The planet candidate is roughly the same size as Jupiter and is no more than 14 times as massive (with 95% confidence). Other cases of white dwarfs with close brown dwarf or stellar companions are explained as the consequence of common-envelope evolution, wherein the original orbit is enveloped during the red-giant phase and shrinks due to friction. In this case, though, the low mass and relatively long orbital period of the planet candidate make common-envelope evolution less likely. Instead, the WD 1856+534 system seems to demonstrate that giant planets can be scattered into tight orbits without being tidally disrupted, and motivates searches for smaller transiting planets around white dwarfs.Comment: 50 pages, 12 figures, 2 tables. Published in Nature on Sept. 17, 2020. The final authenticated version is available online at: https://www.nature.com/articles/s41586-020-2713-

    Rare and low-frequency coding variants alter human adult height

    Get PDF
    Height is a highly heritable, classic polygenic trait with ~700 common associated variants identified so far through genome - wide association studies . Here , we report 83 height - associated coding variants with lower minor allele frequenc ies ( range of 0.1 - 4.8% ) and effects of up to 2 16 cm /allele ( e.g. in IHH , STC2 , AR and CRISPLD2 ) , >10 times the average effect of common variants . In functional follow - up studies, rare height - increasing alleles of STC2 (+1 - 2 cm/allele) compromise d proteolytic inhibition of PAPP - A and increased cleavage of IGFBP - 4 in vitro , resulting in higher bioavailability of insulin - like growth factors . The se 83 height - associated variants overlap genes mutated in monogenic growth disorders and highlight new biological candidates ( e.g. ADAMTS3, IL11RA, NOX4 ) and pathways ( e.g . proteoglycan/ glycosaminoglycan synthesis ) involved in growth . Our results demonstrate that sufficiently large sample sizes can uncover rare and low - frequency variants of moderate to large effect associated with polygenic human phenotypes , and that these variants implicate relevant genes and pathways

    Dissecting the Shared Genetic Architecture of Suicide Attempt, Psychiatric Disorders, and Known Risk Factors

    Get PDF
    Background Suicide is a leading cause of death worldwide, and nonfatal suicide attempts, which occur far more frequently, are a major source of disability and social and economic burden. Both have substantial genetic etiology, which is partially shared and partially distinct from that of related psychiatric disorders. Methods We conducted a genome-wide association study (GWAS) of 29,782 suicide attempt (SA) cases and 519,961 controls in the International Suicide Genetics Consortium (ISGC). The GWAS of SA was conditioned on psychiatric disorders using GWAS summary statistics via multitrait-based conditional and joint analysis, to remove genetic effects on SA mediated by psychiatric disorders. We investigated the shared and divergent genetic architectures of SA, psychiatric disorders, and other known risk factors. Results Two loci reached genome-wide significance for SA: the major histocompatibility complex and an intergenic locus on chromosome 7, the latter of which remained associated with SA after conditioning on psychiatric disorders and replicated in an independent cohort from the Million Veteran Program. This locus has been implicated in risk-taking behavior, smoking, and insomnia. SA showed strong genetic correlation with psychiatric disorders, particularly major depression, and also with smoking, pain, risk-taking behavior, sleep disturbances, lower educational attainment, reproductive traits, lower socioeconomic status, and poorer general health. After conditioning on psychiatric disorders, the genetic correlations between SA and psychiatric disorders decreased, whereas those with nonpsychiatric traits remained largely unchanged. Conclusions Our results identify a risk locus that contributes more strongly to SA than other phenotypes and suggest a shared underlying biology between SA and known risk factors that is not mediated by psychiatric disorders.Peer reviewe

    Measurement of t(t)over-bar normalised multi-differential cross sections in pp collisions at root s=13 TeV, and simultaneous determination of the strong coupling strength, top quark pole mass, and parton distribution functions

    Get PDF
    Peer reviewe
    • 

    corecore