Search CORE

19 research outputs found

A mathematical and computational framework for quantitative comparison and integration of large-scale gene expression data

Author: Bornstein Benjamin J.
Hart Christopher E.
King Brandon
Mjolsness Eric
Sharenbroich Lucas
Trout Diane
Wold Barbara J.
Publication venue: Oxford University Press
Publication date: 01/04/2005
Field of study

Analysis of large-scale gene expression studies usually begins with gene clustering. A ubiquitous problem is that different algorithms applied to the same data inevitably give different results, and the differences are often substantial, involving a quarter or more of the genes analyzed. This raises a series of important but nettlesome questions: How are different clustering results related to each other and to the underlying data structure? Is one clustering objectively superior to another? Which differences, if any, are likely candidates to be biologically important? A systematic and quantitative way to address these questions is needed, together with an effective way to integrate and leverage expression results with other kinds of large-scale data and annotations. We developed a mathematical and computational framework to help quantify, compare, visualize and interactively mine clusterings. We show that by coupling confusion matrices with appropriate metrics (linear assignment and normalized mutual information scores), one can quantify and map differences between clusterings. A version of receiver operator characteristic analysis proved effective for quantifying and visualizing cluster quality and overlap. These methods, plus a flexible library of clustering algorithms, can be called from a new expandable set of software tools called CompClust 1.0 (). CompClust also makes it possible to relate expression clustering patterns to DNA sequence motif occurrences, protein–DNA interaction measurements and various kinds of functional annotations. Test analyses used yeast cell cycle data and revealed data structure not obvious under all algorithms. These results were then integrated with transcription motif and global protein–DNA interaction data to identify G(1) regulatory modules

Crossref

PubMed Central

eScholarship - University of California

Caltech Authors

Recommended from our members

A mathematical and computational framework for quantitative comparison and integration of large-scale gene expression data

Author: Bornstein Benjamin J.
Hart Christopher E.
King Brandon
Mjolsness Eric
Sharenbroich Lucas
Trout Diane
Wold Barbara J.
Publication venue: eScholarship, University of California
Publication date: 10/05/2005
Field of study

eScholarship - University of California

FPKM values computed from RNA-seq measurements of single cells taken from developing forelimbs of C57BL/6 mice

Author: Amrhein Henry
Dickel Diane E.
He Peng
Marinov Georgi K.
Pennacchio Len A.
Ren Bing
Trout Diane
Visel Axel
Williams Brian A.
Wold Barbara
Zhang Yu
Publication venue: CaltechDATA
Publication date: 12/10/2018
Field of study

This table displays FPKM values computed from RNA-seq measurements of single cells taken from developing forelimbs of C57BL/6 mice. The reads were aligned with STAR version 2.5.2a and quantifications made using RSEM version 1.2.15 We used index files provided by www.encodeproject.org. For STAR we used index files from ENCFF483PAE, and RSEM index files ENCFF064YNQ, which were built from male mm10, the GENCODE M4 comprehensive set with tRNAs and ERCC spike ins which all available from ENCFF533JRE

CaltechDATA (California Institute of Technology Research Data Repository)

Integrating expression data, regulatory motif conservation and protein–DNA binding information

Author: Barbara J. Wold (8193)
Benjamin J. Bornstein (8189)
Brandon King (8191)
Christopher E. Hart (8187)
Diane Trout (8190)
Eric Mjolsness (8192)
Lucas Sharenbroich (8188)
Publication venue
Publication date
Field of study

Copyright information:Taken from "A mathematical and computational framework for quantitative comparison and integration of large-scale gene expression data"Nucleic Acids Research 2005;33(8):2580-2594.Published online 10 May 2005PMCID:PMC1092273.© The Author 2005. Published by Oxford University Press. All rights reserved () Binding site enrichment in genes from the four confusion matrix cells of that dissect genes in the G cell cycle phase. Shown in red are the observed number of genes with a MCS score above threshold for each motif. Shown in blue are the number of genes expected by chance, as computed by bootstrap simulations. The total number of genes each cell contains is in the upper left. (B–D) Heat-map displays showing expression data on the left, followed by MCS scores for a specified motif, followed by protein–DNA binding data for transcription factors implicated in binding to the specified consensus. Color scales for each panel are at the bottom of the figure. For the MCS scores, the color map ranges from 0 to the 99th percentile to minimize the influence of extreme outliers on interpretation. () Shown are 14 genes that fall within the EM1/Early G intersection cell and have a conserved enrichment in the presence of the SWI5 consensus as measured by MCS scores (see Methods; –) () Shown are 79 genes that fall within EM2/Late G intersection cell and have a high MCS score for MCB. () Shown are 20 genes that fall within EM2/Late G intersection cell and have a high MCS score for SCB. In each heat-map genes are ordered by decreasing MCS score. Significant correlation can be seen between a high MCS score, protein–DNA binding and the expected expression pattern

FigShare

Recommended from our members

Spatiotemporal DNA methylome dynamics of the developing mouse fetus.

Author: Amrhein Henry
Castanon Rosa G
Chen Huaming
Dickel Diane E
Ecker Joseph R
Fang Rongxin
Gorkin David U
Hariharan Manoj
He Yupeng
Huang Hui
Lee Ah Young
Li Bin
Luo Chongyuan
Nery Joseph R
Pennacchio Len A
Ren Bing
Trout Diane
Visel Axel
Williams Brian A
Zhao Yuan
Publication venue: eScholarship, University of California
Publication date: 01/07/2020
Field of study

Cytosine DNA methylation is essential for mammalian development but understanding of its spatiotemporal distribution in the developing embryo remains limited1,2. Here, as part of the mouse Encyclopedia of DNA Elements (ENCODE) project, we profiled 168 methylomes from 12 mouse tissues or organs at 9 developmental stages from embryogenesis to adulthood. We identified 1,808,810 genomic regions that showed variations in CG methylation by comparing the methylomes of different tissues or organs from different developmental stages. These DNA elements predominantly lose CG methylation during fetal development, whereas the trend is reversed after birth. During late stages of fetal development, non-CG methylation accumulated within the bodies of key developmental transcription factor genes, coinciding with their transcriptional repression. Integration of genome-wide DNA methylation, histone modification and chromatin accessibility data enabled us to predict 461,141 putative developmental tissue-specific enhancers, the human orthologues of which were enriched for disease-associated genetic variants. These spatiotemporal epigenome maps provide a resource for studies of gene regulation during tissue or organ progression, and a starting point for investigating regulatory elements that are involved in human developmental disorders

eScholarship - University of California

Recommended from our members

Author Correction: An atlas of dynamic chromatin landscapes in mouse fetal development.

Author: Afzal Veena
Akiyama Jennifer A
Amrhein Henry
Barozzi Iros
Chee Sora
Cherry J Michael
Chiou Joshua
Davidson Jean M
Dickel Diane E
Ding Bo
Ecker Joseph R
Fukuda-Yuzawa Yoko
Garvin Tyler H
Gaulton Kyle
Gorkin David U
Han Jee Yun
Harrington Anne N
He Yupeng
Huang Hui
Kato Momoe
Lee Ah Young
Lee Elizabeth A
Li Bin
Mannion Brandon J
Novak Catherine S
Pennacchio Len A
Pham Quan T
Plajzer-Frick Ingrid
Preissl Sebastian
Qiu Yunjiang
Ren Bing
Shen Yin
Strattan J Seth
Trout Diane
Visel Axel
Wang Mengchi
Wang Wei
Wildberg Andre
Williams Brian A
Yang Hongbo
Zhang Bo
Zhang Yanxiao
Zhao Yuan
Publication venue: eScholarship, University of California
Publication date: 01/10/2020
Field of study

A Correction to this paper has been published: https://doi.org/10.1038/s41586-020-03089-4

eScholarship - University of California

Recommended from our members

Author Correction: An atlas of dynamic chromatin landscapes in mouse fetal development.

Author: Afzal Veena
Akiyama Jennifer A
Amrhein Henry
Barozzi Iros
Chee Sora
Cherry J Michael
Chiou Joshua
Davidson Jean M
Dickel Diane E
Ding Bo
Ecker Joseph R
Fukuda-Yuzawa Yoko
Garvin Tyler H
Gaulton Kyle
Gorkin David U
Han Jee Yun
Harrington Anne N
He Yupeng
Huang Hui
Kato Momoe
Lee Ah Young
Lee Elizabeth A
Li Bin
Mannion Brandon J
Novak Catherine S
Pennacchio Len A
Pham Quan T
Plajzer-Frick Ingrid
Preissl Sebastian
Qiu Yunjiang
Ren Bing
Shen Yin
Strattan J Seth
Trout Diane
Visel Axel
Wang Mengchi
Wang Wei
Wildberg Andre
Williams Brian A
Yang Hongbo
Zhang Bo
Zhang Yanxiao
Zhao Yuan
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

A Correction to this paper has been published: https://doi.org/10.1038/s41586-020-03089-4

eScholarship - University of California