Search CORE

21 research outputs found

ENCODE whole-genome data in the UCSC genome browser (2011 update)

Author: Andy Pohl
Angie S. Hinrichs
Ann S. Zweig
Baroni
Bernard B. Suh
Birney
Brian J. Raney
Brooke Rhead
Celniker
Cricket A. Sloan
David Haussler
Donna Karolchik
Galt P. Barber
Greenbaum
Harrow
Hershey
Hesselberth
Hiram Clawson
Kan
Kate R. Rosenbloom
Katrina Learned
Kayla E. Smith
Kent
Khatun
King
Krishna M. Roskin
Kuhn
Laurence R. Meyer
Li
Melissa S. Cline
Pauline A. Fujita
Robert M. Kuhn
Rosenbloom
Timothy R. Dreszer
Vanessa Kirkup
Venkat S. Malladi
Via
W. James Kent
Weirauch
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access

CiteSeerX

Crossref

PubMed Central

Barriers to accessing public cancer genomic data.

Author: Learned Katrina,
Publication venue
Publication date: 28/09/2020
Field of study

Ezid

Recommended from our members

Barriers to accessing public cancer genomic data.

Author: Beale Holly C
Bjork Isabel M
Currie Robert
Durbin Ann
Goldstein Theodore C
Haussler David
Kephart Ellen Towle
Learned Katrina
Pfeil Jacob
Salama Sofie R
Sanders Lauren M
Vaske Olena Morozova
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

Although increasingly recognized as critical to genomic research, genomic data sharing is hindered by an absence of standards regarding timing, patient privacy, use agreement standards, and data characterization and quality. Only after months of identifying, permissioning for use, committing to terms restricting use and sharing, downloading, and assessing quality, is it possible to know whether or not a dataset can be used. In this paper, we evaluate the barriers to data sharing based on the Treehouse experience and offer recommendations for use agreement standards, data characterization and metadata standardization to enhance data sharing and outcomes for all pediatric cancer patients

eScholarship - University of California

Recommended from our members

Identification of a differentiation stall in epithelial mesenchymal transition in histone H3-mutant diffuse midline glioma.

Author: Beale Holly C
Bjork Isabel
Chen Marissa
Cheney Allison
Haussler David
Kephart Ellen Towle
Learned Katrina
Lyle A Geoffrey
Pfeil Jacob
Salama Sofie R
Sanders Lauren M
Seninge Lucas
van den Bout Anouk
Vaske Olena M
Publication venue: eScholarship, University of California
Publication date: 01/12/2020
Field of study

BackgroundDiffuse midline gliomas with histone H3 K27M (H3K27M) mutations occur in early childhood and are marked by an invasive phenotype and global decrease in H3K27me3, an epigenetic mark that regulates differentiation and development. H3K27M mutation timing and effect on early embryonic brain development are not fully characterized.ResultsWe analyzed multiple publicly available RNA sequencing datasets to identify differentially expressed genes between H3K27M and non-K27M pediatric gliomas. We found that genes involved in the epithelial-mesenchymal transition (EMT) were significantly overrepresented among differentially expressed genes. Overall, the expression of pre-EMT genes was increased in the H3K27M tumors as compared to non-K27M tumors, while the expression of post-EMT genes was decreased. We hypothesized that H3K27M may contribute to gliomagenesis by stalling an EMT required for early brain development, and evaluated this hypothesis by using another publicly available dataset of single-cell and bulk RNA sequencing data from developing cerebral organoids. This analysis revealed similarities between H3K27M tumors and pre-EMT normal brain cells. Finally, a previously published single-cell RNA sequencing dataset of H3K27M and non-K27M gliomas revealed subgroups of cells at different stages of EMT. In particular, H3.1K27M tumors resemble a later EMT stage compared to H3.3K27M tumors.ConclusionsOur data analyses indicate that this mutation may be associated with a differentiation stall evident from the failure to proceed through the EMT-like developmental processes, and that H3K27M cells preferentially exist in a pre-EMT cell phenotype. This study demonstrates how novel biological insights could be derived from combined analysis of several previously published datasets, highlighting the importance of making genomic data available to the community in a timely manner

eScholarship - University of California

Recommended from our members

Identification of a differentiation stall in epithelial mesenchymal transition in histone H3-mutant diffuse midline glioma.

Author: Beale Holly C
Bjork Isabel
Chen Marissa
Cheney Allison
Haussler David
Kephart Ellen Towle
Learned Katrina
Lyle A Geoffrey
Pfeil Jacob
Salama Sofie R
Sanders Lauren M
Seninge Lucas
van den Bout Anouk
Vaske Olena M
Publication venue: eScholarship, University of California
Publication date: 01/12/2020
Field of study

Diffuse midline gliomas with histone H3 K27M (H3K27M) mutations occur in early childhood and are marked by an invasive phenotype and global decrease in H3K27me3, an epigenetic mark that regulates differentiation and development. H3K27M mutation timing and effect on early embryonic brain development are not fully characterized. We analyzed multiple publicly available RNA sequencing datasets to identify differentially expressed genes between H3K27M and non-K27M pediatric gliomas. We found that genes involved in the epithelial-mesenchymal transition (EMT) were significantly overrepresented among differentially expressed genes. Overall, the expression of pre-EMT genes was increased in the H3K27M tumors as compared to non-K27M tumors, while the expression of post-EMT genes was decreased. We hypothesized that H3K27M may contribute to gliomagenesis by stalling an EMT required for early brain development, and evaluated this hypothesis by using another publicly available dataset of single-cell and bulk RNA sequencing data from developing cerebral organoids. This analysis revealed similarities between H3K27M tumors and pre-EMT normal brain cells. Finally, a previously published single-cell RNA sequencing dataset of H3K27M and non-K27M gliomas revealed subgroups of cells at different stages of EMT. In particular, H3.1K27M tumors resemble a later EMT stage compared to H3.3K27M tumors. Our data analyses indicate that this mutation may be associated with a differentiation stall evident from the failure to proceed through the EMT-like developmental processes, and that H3K27M cells preferentially exist in a pre-EMT cell phenotype. This study demonstrates how novel biological insights could be derived from combined analysis of several previously published datasets, highlighting the importance of making genomic data available to the community in a timely manner

eScholarship - University of California

Machine learning multi-omics analysis reveals cancer driver dysregulation in pan-cancer cell lines compared to primary tumors

Author: Beale Holly C
Chandra Rahul
Cheney Allison
Currie Rob
Gitlin Leonid
Haussler David
Kephart Ellen Towle
Learned Katrina
Lyle A Geoffrey
Pfeil Jacob
Rodriguez Analiz
Salama Sofie R
Sanders Lauren M
Vaske Olena M
Vengerov David
Zebarjadi Navid
Publication venue: eScholarship, University of California
Publication date: 01/01/2022
Field of study

Cancer cell lines have been widely used for decades to study biological processes driving cancer development, and to identify biomarkers of response to therapeutic agents. Advances in genomic sequencing have made possible large-scale genomic characterizations of collections of cancer cell lines and primary tumors, such as the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA). These studies allow for the first time a comprehensive evaluation of the comparability of cancer cell lines and primary tumors on the genomic and proteomic level. Here we employ bulk mRNA and micro-RNA sequencing data from thousands of samples in CCLE and TCGA, and proteomic data from partner studies in the MD Anderson Cell Line Project (MCLP) and The Cancer Proteome Atlas (TCPA), to characterize the extent to which cancer cell lines recapitulate tumors. We identify dysregulation of a long non-coding RNA and microRNA regulatory network in cancer cell lines, associated with differential expression between cell lines and primary tumors in four key cancer driver pathways: KRAS signaling, NFKB signaling, IL2/STAT5 signaling and TP53 signaling. Our results emphasize the necessity for careful interpretation of cancer cell line experiments, particularly with respect to therapeutic treatments targeting these important cancer pathways

PubMed Central

eScholarship - University of California

Recommended from our members

The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.

Author: Beale Holly C
Bjork Isabel
Cattle Matthew A
Currie Rob
Haussler David
Kephart Ellen T
Lam Du Linh
Learned Katrina
Lyle A Geoffrey
McKay Liam T
Pfeil Jacob
Roger Jacquelyn M
Salama Sofie R
Sanders Lauren
Thompson Drew KA
Vaske Olena M
Vivian John
Publication venue: eScholarship, University of California
Publication date: 01/03/2021
Field of study

BackgroundThe reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis.FindingsIn bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1-77% of all reads (median [IQR], 3% [3-6%]); duplicate reads constitute 3-100% of mapped reads (median [IQR], 27% [13-43%]); and non-exonic reads constitute 4-97% of mapped, non-duplicate reads (median [IQR], 25% [16-37%]). MEND reads constitute 0-79% of total reads (median [IQR], 50% [30-61%]).ConclusionsBecause not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth

eScholarship - University of California

The UCSC Genome Browser database: 2016 update.

Author: Barber Galt P
Casper Jonathan
Clawson Hiram
Diekhans Mark
Eisenhart Christopher
Fujita Pauline A
Guruvadoo Luvina
Haeussler Maximilian
Harte Rachel A
Haussler David
Heitner Steve
Hinrichs Angie S
Karolchik Donna
Kent W James
Kuhn Robert M
Learned Katrina
Lee Brian T
Nejad Parisa
Paten Benedict
Raney Brian J
Rosenbloom Kate R
Speir Matthew L
Zweig Ann S
Publication venue: eScholarship, University of California
Publication date: 20/11/2015
Field of study

For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment

PubMed Central

eScholarship - University of California

The UCSC Genome Browser database: 2016 update

Author: Angie S. Hinrichs
Ann S. Zweig
Benedict Paten
Brian J. Raney
Brian T. Lee
Christopher Eisenhart
David Haussler
Donna Karolchik
Galt P. Barber
Hiram Clawson
Jonathan Casper
Kate R. Rosenbloom
Katrina Learned
Luvina Guruvadoo
Mark Diekhans
Matthew L. Speir
Maximilian Haeussler
Parisa Nejad
Pauline A. Fujita
Rachel A. Harte
Robert M. Kuhn
Steve Heitner
W. James Kent
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/11/2015
Field of study

For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the “Data Integrator”, for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment

Crossref

PubMed Central

eScholarship - University of California