18 research outputs found
Immuno-transcriptomic profiling of extracranial pediatric solid malignancies.
We perform an immunogenomics analysis utilizing whole-transcriptome sequencing of 657 pediatric extracranial solid cancer samples representing 14 diagnoses, and additionally utilize transcriptomes of 131 pediatric cancer cell lines and 147 normal tissue samples for comparison. We describe patterns of infiltrating immune cells, T cell receptor (TCR) clonal expansion, and translationally relevant immune checkpoints. We find that tumor-infiltrating lymphocytes and TCR counts vary widely across cancer types and within each diagnosis, and notably are significantly predictive of survival in osteosarcoma patients. We identify potential cancer-specific immunotherapeutic targets for adoptive cell therapies including cell-surface proteins, tumor germline antigens, and lineage-specific transcription factors. Using an orthogonal immunopeptidomics approach, we find several potential immunotherapeutic targets in osteosarcoma and Ewing sarcoma and validated PRAME as a bona fide multi-pediatric cancer target. Importantly, this work provides a critical framework for immune targeting of extracranial solid tumors using parallel immuno-transcriptomic and -peptidomic approaches
Guanine Holes Are Prominent Targets for Mutation in Cancer and Inherited Disease
Albino Bacolla, Guliang Wang, Aklank Jain, Karen M. Vasquez, Division of Pharmacology and Toxicology, The University of Texas at Austin, Dell Pediatric Research Institute, Austin, Texas, United States of AmericaAlbino Bacolla, Nuri A. Temiz, Ming Yi, Joseph Ivanic, Regina Z. Cer, Duncan E. Donohue, Uma S. Mudunuri, Natalia Volfovsky, Brian T. Luke, Robert M., Stephens, Jack R. Collins, Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of AmericaEdward V. Ball, David N. Cooper, Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United KingdomSingle base substitutions constitute the most frequent type of human gene mutation and are a leading cause of cancer and inherited disease. These alterations occur non-randomly in DNA, being strongly influenced by the local nucleotide sequence context. However, the molecular mechanisms underlying such sequence context-dependent mutagenesis are not fully understood. Using bioinformatics, computational and molecular modeling analyses, we have determined the frequencies of mutation at G•C bp in the context of all 64 5′-NGNN-3′ motifs that contain the mutation at the second position. Twenty-four datasets were employed, comprising >530,000 somatic single base substitutions from 21 cancer genomes, >77,000 germline single-base substitutions causing or associated with human inherited disease and 16.7 million benign germline single-nucleotide variants. In several cancer types, the number of mutated motifs correlated both with the free energies of base stacking and the energies required for abstracting an electron from the target guanines (ionization potentials). Similar correlations were also evident for the pathological missense and nonsense germline mutations, but only when the target guanines were located on the non-transcribed DNA strand. Likewise, pathogenic splicing mutations predominantly affected positions in which a purine was located on the non-transcribed DNA strand. Novel candidate driver mutations and tissue-specific mutational patterns were also identified in the cancer datasets. We conclude that electron transfer reactions within the DNA molecule contribute to sequence context-dependent mutagenesis, involving both somatic driver and passenger mutations in cancer, as well as germline alterations causing or associated with inherited disease.This work was supported by grants from the NIH (CA097175 and CA093729) to KMV, NCI/NIH contract HHSN261200800001E to AB and the Frederick National Laboratory for Cancer Research, and CBIIT/caBIG ISRCE yellow task #09-260 to the Frederick National Laboratory for Cancer Research. DNC and EVB received financial support from BIOBASE GmbH through a license agreement (for HGMD) with Cardiff University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.PharmacyEmail: [email protected]
Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
Abstract Background Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. Results This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. Conclusions Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases
Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data
<div><p>As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.</p></div
Bubble chart of Cancer-Gene associations from literature.
<p>A bubble chart representation with cancer terms on the x-axis and genes on the y-axis. The size of the bubble is directly proportional to the number of literature articles where the cancer and gene terms co-occur.</p
Cancer term occurrences in the literature.
<p>A bar chart representation with cancer terms on the y-axis and publication counts on the x-axis. Only the cancer terms with high literature occurrences are shown.</p
Architecture for integrating structured and unstructured data in Hadoop.
<p>Architectural diagram detailing the steps in creating the categorical lexicons and using them to get the PMID counts from literature. DEG stands for Differentially Expressed Gene while DE miRNA stands for Differentially Expressed miRNA.</p
Network of Cancer-Gene associations from literature.
<p>Network of Cancer/Gene associations displaying shared genes between cancers and genes specific to certain cancer types based on literature evidence. Cancer terms are represented as labeled nodes, genes are unlabeled pink nodes and the edges represent at least one publication with a co-occurrence of the cancer term and gene.</p
Load and query times using simulated gene expression data.
*<p> Query to get the DEG list was not run on the 8TB data due to time constraints.</p
Growth of articles in MEDLINE.
<p>A bar chart displaying the number of baseline records in NLM MEDLINE’s 2001 baseline release to 2012 baseline release. (<a href="http://www.nlm.nih.gov/bsd/licensee/2012_stats/baseline_doc.html" target="_blank">http://www.nlm.nih.gov/bsd/licensee/2012_stats/baseline_doc.html</a>).</p