55 research outputs found

    Predicting enhancers using a small subset of high confidence examples and co-training

    Get PDF
    ABSTRACT Enhancers are important regulatory regions located throughout the genome, primarily in non-coding regions. Several experimental methods have been developed over the last several years to identify their location, but the search space is large and the overlap between the putative enhancer identified using these methods tends to be very small. Computational methods for enhancer prediction often use one large set of experimentally identified enhancer regions as input, and therefore rely critically on their correctness. We chose to take a different approach, and start with a high confidence set of 21 enhancer that are in the intersection of enhancers identified using three completely unrelated experimental approaches: deepCAGE, HiCap and classical enhancer reporter assays. Because this starting set is so small, we use a semi-supervised approach called co-training rather than a fully supervised approach to progressively predict enhancers from unlabeled regions. Using this approach we are able to outperform supervised learning as well as simpler semi-supervised learning methods and achieve an average area under the ROC curve of 0.84

    SARS-CoV-2 Omicron variants BA.1 and BA.2 both show similarly reduced disease severity of COVID-19 compared to Delta, Germany, 2021 to 2022

    Get PDF
    German national surveillance data analysis shows that hospitalisation odds associated with Omicron lineage BA.1 or BA.2 infections are up to 80% lower than with Delta infection, primarily in ≥ 35-year-olds. Hospitalised vaccinated Omicron cases’ proportions (2.3% for both lineages) seemed lower than those of the unvaccinated (4.4% for both lineages). Independent of vaccination status, the hospitalisation frequency among cases with Delta seemed nearly threefold higher (8.3%) than with Omicron (3.0% for both lineages), suggesting that Omicron inherently causes less severe disease.Peer Reviewe

    Recent developments in StemBase: a tool to study gene expression in human and murine stem cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation.</p> <p>Findings</p> <p>Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments.</p> <p>Conclusion</p> <p>StemBase can be used to study gene expression in human and murine stem cells and is available at <url>http://www.stembase.ca</url>.</p

    Refphase: Multi-sample phasing reveals haplotype-specific copy number heterogeneity

    Get PDF
    Most computational methods that infer somatic copy number alterations (SCNAs) from bulk sequencing of DNA analyse tumour samples individually. However, the sequencing of multiple tumour samples from a patient’s disease is an increasingly common practice. We introduce Refphase, an algorithm that leverages this multi-sampling approach to infer haplotype-specific copy numbers through multi-sample phasing. We demonstrate Refphase’s ability to infer haplotype-specific SCNAs and characterise their intra-tumour heterogeneity, to uncover previously undetected allelic imbalance in low purity samples, and to identify parallel evolution in the context of whole genome doubling in a pan-cancer cohort of 336 samples from 99 tumours

    MedlineRanker: flexible ranking of biomedical literature

    Get PDF
    The biomedical literature is represented by millions of abstracts available in the Medline database. These abstracts can be queried with the PubMed interface, which provides a keyword-based Boolean search engine. This approach shows limitations in the retrieval of abstracts related to very specific topics, as it is difficult for a non-expert user to find all of the most relevant keywords related to a biomedical topic. Additionally, when searching for more general topics, the same approach may return hundreds of unranked references. To address these issues, text mining tools have been developed to help scientists focus on relevant abstracts. We have implemented the MedlineRanker webserver, which allows a flexible ranking of Medline for a topic of interest without expert knowledge. Given some abstracts related to a topic, the program deduces automatically the most discriminative words in comparison to a random selection. These words are used to score other abstracts, including those from not yet annotated recent publications, which can be then ranked by relevance. We show that our tool can be highly accurate and that it is able to process millions of abstracts in a practical amount of time. MedlineRanker is free for use and is available at http://cbdm.mdc-berlin.de/tools/medlineranker

    A local human Vδ1 T cell population is associated with survival in nonsmall-cell lung cancer

    Get PDF
    Funding Information: D.B. has consulted for NanoString, reports honoraria from AstraZeneca and has a patent (PCT/GB2020/050221) issued on methods for cancer prognostication. J.R. and M.A.B. have consulted for Achilles Therapeutics. N.M. has stock options in and has consulted for Achilles Therapeutics. N.M. holds European patents relating to targeting neoantigens (PCT/EP2016/059401), identifying patient response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA loss of heterozygosity (PCT/GB2018/052004) and predicting survival rates of patients with cancer (PCT/GB2020/050221). A.H. attended one advisory board for Abbvie, Roche and GRAIL, and reports personal fees from Abbvie, Boehringer Ingelheim, Takeda, AstraZeneca, Daiichi Sankyo, Merck Serono, Merck/MSD, UCB and Roche for delivering general education/training in clinical trials. A.H. owned shares in Illumina and Thermo Fisher Scientific (sold in 2020) and receives fees for membership of Independent Data Monitoring Committees for Roche-sponsored clinical trials. S.A.Q. is co-founder and Chief Scientific Officer of Achilles Therapeutics. A.C.H. is a board member and equity holder in ImmunoQure, AG and Gamma Delta Therapeutics, and is an equity holder in Adaptate Biotherapeutics and chair of the scientific advisory board. C.S. acknowledges grant support from Pfizer, AstraZeneca, Bristol Myers Squibb, Roche-Ventana, Boehringer Ingelheim, Archer Dx Inc (collaboration in minimal residual disease-sequencing technologies) and Ono Pharmaceuticals, is an AstraZeneca Advisory Board member and Chief Investigator for the MeRmaiD1 clinical trial. C.S has consulted for Amgen, AstraZeneca, Bicycle Therapeutics, Bristol Myers Squibb, Celgene, Genentech, GlaxoSmithKline, GRAIL, Illumina, Medixci, Metabomed, MSD, Novartis, Pfizer, Roche-Ventana and Sarah Cannon Research Institute. C.S. has stock options in Apogen Biotechnologies, Epic Biosciences and GRAIL, and has stock options and is co-founder of Achilles Therapeutics. C.S. holds patents relating: to assay technology to detect tumor recurrence (PCT/GB2017/053289); to targeting neoantigens (PCT/EP2016/059401), identifying patent response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA loss of heterozygosity (PCT/GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221); to treating cancer by targeting Insertion/deletion (indel) mutations (PCT/GB2018/051893); to identifying indel mutation targets (PCT/GB2018/051892); to methods for lung cancer detection (PCT/US2017/028013); and to identifying responders to cancer treatment (PCT/GB2018/051912). The remaining authors declare no competing interests. Funding Information: We thank the Oxford Genomics Centre at the Wellcome Centre for Human Genetics (funded by Wellcome Trust grant no. 203141/Z/16/Z) for the generation and initial processing of the RNA-seq data from sorted TILs. We thank S. Bola for technical support and S. Vanloo for administrative support. The GTEx project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by the NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. Y.W. was supported by a Wellcome Trust Clinical Research Career Development Fellowship (no. 220589/Z/20/Z), an Academy of Medical Sciences Starter Grant for Clinical Lecturers, a National Institute for Health Research (NIHR) Academic Clinical Lectureship and the NIHR University College London Hospitals Biomedical Research Centre. D.B. was supported by funding from the NIHR University College London Hospitals Biomedical Research Centre, the ideas 2 innovation translation scheme at the Francis Crick Institute, the Breast Cancer Research Foundation (BCRF) and a Cancer Research UK (CRUK) Early Detection and Diagnosis Project award. M.J.H. is a CRUK Fellow and has received funding from CRUK, NIHR, Rosetrees Trust, UKI NETs and the NIHR University College London Hospitals Biomedical Research Centre. C.S. is Royal Society Napier Research Professor. This work was supported by the Francis Crick Institute which receives its core funding from CRUK (no. FC001169), the UK Medical Research Council (no. FC001169) and the Wellcome Trust (no. FC001169). This research was funded in whole, or in part, by the Wellcome Trust (no. FC001169). For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. C.S. is funded by CRUK (TRACERx, PEACE and CRUK Cancer Immunotherapy Catalyst Network), CRUK Lung Cancer Centre of Excellence (no. C11496/A30025), the Rosetrees Trust, Butterfield and Stoneygate Trusts, NovoNordisk Foundation (ID16584), Royal Society Professorship Enhancement Award (no. RP/EA/180007), the NIHR Biomedical Research Centre at University College London Hospitals, the CRUK–University College London Centre, Experimental Cancer Medicine Centre and the BCRF. This work was supported by a Stand Up To Cancer‐LUNGevity-American Lung Association Lung Cancer Interception Dream Team Translational Research Grant (grant no. SU2C-AACR-DT23-17 to S. M. Dubinett and A. E. Spira). Stand Up To Cancer is a division of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the Scientific Partner of SU2C. C.S. receives funding from the European Research Council (ERC) under the European Union’s Seventh Framework Programme (no. FP7/2007-2013) Consolidator Grant (no. FP7-THESEUS-617844), European Commission ITN (no. FP7-PloidyNet 607722), an ERC Advanced Grant (PROTEUS) from the ERC under the European Union’s Horizon 2020 research and innovation program (grant no. 835297), and Chromavision from the European Union’s Horizon 2020 research and innovation program (grant no. 665233). Publisher Copyright: © 2022, The Author(s).Peer reviewedPublisher PD

    Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin

    Get PDF
    A growing number of solved protein structures display an elongated structural domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel alpha-helices. Alpha-rods are flexible and expose a large surface, which makes them suitable for protein interaction. Although most likely originating by tandem duplication of a two-helix unit, their detection using sequence similarity between repeats is poor. Here, we show that alpha-rod repeats can be detected using a neural network. The network detects more repeats than are identified by domain databases using multiple profiles, with a low level of false positives (<10%). We identify alpha-rod repeats in approximately 0.4% of proteins in eukaryotic genomes. We then investigate the results for all human proteins, identifying alpha-rod repeats for the first time in six protein families, including proteins STAG1-3, SERAC1, and PSMD1-2 & 5. We also characterize a short version of these repeats in eight protein families of Archaeal, Bacterial, and Fungal species. Finally, we demonstrate the utility of these predictions in directing experimental work to demarcate three alpha-rods in huntingtin, a protein mutated in Huntington's disease. Using yeast two hybrid analysis and an immunoprecipitation technique, we show that the huntingtin fragments containing alpha-rods associate with each other. This is the first definition of domains in huntingtin and the first validation of predicted interactions between fragments of huntingtin, which sets up directions toward functional characterization of this protein. An implementation of the repeat detection algorithm is available as a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualized using BiasViz, a graphic tool for representation of multiple sequence alignments

    Synopse virologischer Analysen im Nationalen Referenzzentrum für Influenzaviren während der COVID-19-Pandemie

    Get PDF
    Das Nationale Referenzzentrum für Influenzaviren gewinnt durch die fortlaufende Untersuchung von Proben aus den Sentinelpraxen der Arbeitsgemeinschaft Influenza einen umfassenden Überblick über die zirkulierenden respiratorischen Erreger in Deutschland. Dazu gehören neben SARS-CoV-2 und den Influenzaviren auch das Respiratorische Synzytialvirus, Parainfluenzaviren, humane Metapneumoviren, humane saisonale Coronaviren und humane Rhinoviren. Die Analyseergebnisse von 15.660 Sentinelproben sowie weiteren Isolaten im Zeitraum von Kalenderwoche 5/2020 bis 21/2022 werden im Epidemiologischen Bulletin 22/2022 vorgestellt. Beschrieben werden außerdem die Zirkulation respiratorischer Erreger im Vergleich zu vorpandemischen Saisons, die molekulare Charakterisierung und phylogenetische Analysen, die Überprüfung der Passgenauigkeit der eingesetzten Influenzaimpfstoffe und die Resistenzprüfung von Influenzaviren

    Genomic basis for RNA alterations in cancer.

    Get PDF
    Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer
    corecore