Search CORE

21 research outputs found

Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata

Author: B. Hayete
Basso
Beer
Brazma
di Bernardo
E. J. Cosgrove
F. S. Juhn
Faith
Faith
Glasner
Hughes
Irizarry
J. J. Faith
Keseler
M. E. Driscoll
S. J. Schneider
Salgado
T. S. Gardner
V. A. Fusaro
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Many Microbe Microarrays Database (M3D) is designed to facilitate the analysis and visualization of expression data in compendia compiled from multiple laboratories. M3D contains over a thousand Affymetrix microarrays for Escherichia coli, Saccharomyces cerevisiae and Shewanella oneidensis. The expression data is uniformly normalized to make the data generated by different laboratories and researchers more comparable. To facilitate computational analyses, M3D provides raw data (CEL file) and normalized data downloads of each compendium. In addition, web-based construction, visualization and download of custom datasets are provided to facilitate efficient interrogation of the compendium for more focused analyses. The experimental condition metadata in M3D is human curated with each chemical and growth attribute stored as a structured and computable set of experimental features with consistent naming conventions and units. All versions of the normalized compendia constructed for each species are maintained and accessible in perpetuity to facilitate the future interpretation and comparison of results published on M3D data. M3D is accessible at http://m3d.bu.edu/

Crossref

Boston University Institutional Repository (OpenBU)

PubMed Central

Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata

Author: B. Hayete
Basso
Beer
Brazma
di Bernardo
E. J. Cosgrove
F. S. Juhn
Faith
Faith
Glasner
Hughes
Irizarry
J. J. Faith
Keseler
M. E. Driscoll
S. J. Schneider
Salgado
T. S. Gardner
V. A. Fusaro
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Crossref

Boston University Institutional Repository (OpenBU)

PubMed Central

Predicting gene function using hierarchical multi-label decision tree ensembles

Author: A Clare
A Clare
A Clare
B Hayete
C Vens
Celine Vens
D Kocev
Dragi Kocev
E Zdobnov
F Provost
F Wilcoxon
G Obozinski
GR Lanckriet
H Blockeel
H Blockeel
H Blockeel
H Chua
H Drucker
H Lee
H Mewes
Hendrik Blockeel
J Davis
J Gough
J Quinlan
J Rousu
J Struyf
Jan Struyf
L Breiman
L Breiman
L Breiman
L Breiman
L Pena-Castillo
Leander Schietgat
M Ashburner
M Deng
M Ouali
N Cesa-Bianchi
O Troyanskaya
R Caruana
S Altschul
S Mostafavi
Sašo Džeroski
T Hughes
T Joachims
U Karaoz
W Kim
W Tian
Y Chen
Y Guan
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background <it>S. cerevisiae</it>, <it>A. thaliana </it>and <it>M. musculus </it>are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Leiden University Scholary Publications

DENSE STRUCTURAL EXPECTATION MAXIMISATION WITH PARALLELISATION FOR EFFICIENT LARGE-NETWORK STRUCTURAL INFERENCE

Author: Chickering D. M.
CHRISTOPHER FOGELBERG
Hayete B.
Langseth H.
Roy S.
Spirtes P.
VASILE PALADE
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Inferring Gene Networks for Strains of Dehalococcoides Highlights Conserved Relationships between Genes Encoding Core Catabolic and Cell-Wall Structural Proteins

Author: Annette R. Rowe (1969924)
Boris Hayete (25337)
Bruce W. Church (3339357)
Cresten B. Mansfeldt (695827)
Gretchen W. Heavner (3339360)
Ruth E. Richardson (695829)
Publication venue
Publication date: 09/11/2016
Field of study

<div>The interpretation of high-throughput gene expression data for non-model microorganisms remains obscured because of the high fraction of hypothetical genes and the limited number of methods for the robust inference of gene networks. Therefore, to elucidate gene-gene and gene-condition linkages in the bioremediation-important genus Dehalococcoides, we applied a Bayesian inference strategy called Reverse Engineering/Forward Simulation (REFS™) on transcriptomic data collected from two organohalide-respiring communities containing different Dehalococcoides mccartyi strains: the Cornell University mixed community D2 and the commercially available KB-1® bioaugmentation culture. In total, 49 and 24 microarray datasets were included in the REFS™ analysis to generate an ensemble of 1,000 networks for the Dehalococcoides population in the Cornell D2 and KB-1® culture, respectively. Considering only linkages that appeared in the consensus network for each culture (exceeding the determined frequency cutoff of ≥ 60%), the resulting Cornell D2 and KB-1® consensus networks maintained 1,105 nodes (genes or conditions) with 974 edges and 1,714 nodes with 1,455 edges, respectively. These consensus networks captured multiple strong and biologically informative relationships. One of the main highlighted relationships shared between these two cultures was a direct edge between the transcript encoding for the major reductive dehalogenase (tceA (D2) or vcrA (KB-1®)) and the transcript for the putative S-layer cell wall protein (DET1407 (D2) or KB1_1396 (KB-1®)). Additionally, transcripts for two key oxidoreductases (a [Ni Fe] hydrogenase, Hup, and a protein with similarity to a formate dehydrogenase, “Fdh”) were strongly linked, generalizing a strong relationship noted previously for Dehalococcoides mccartyi strain 195 to multiple strains of Dehalococcoides. Notably, the pangenome array utilized when monitoring the KB-1® culture was capable of resolving signals from multiple strains, and the network inference engine was able to reconstruct gene networks in the distinct strain populations.</div

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

REFS™ consensus network summary for the hup and fdh transcripts.

Author: Annette R. Rowe (1969924)
Boris Hayete (25337)
Bruce W. Church (3339357)
Cresten B. Mansfeldt (695827)
Gretchen W. Heavner (3339360)
Ruth E. Richardson (695829)
Publication venue
Publication date
Field of study

(a) D2 and (b) KB-1®. The connecting lines indicate edge strength scores that exceeded 0.6. Gray text in the KB-1® culture indicates the minor stain of the Cornell/Victoria type. All relationships identified in the model between these transcripts were positive.</p

The Francis Crick Institute

Ordering Dhc pangenome array probes based on sequence similarity and captured expression profiles for the KB-1® culture.

Author: Annette R. Rowe (1969924)
Boris Hayete (25337)
Bruce W. Church (3339357)
Cresten B. Mansfeldt (695827)
Gretchen W. Heavner (3339360)
Ruth E. Richardson (695829)
Publication venue
Publication date
Field of study

The array contains multiple probes for Dhc orthologs. The white-to-blue shaded columns (left) display the genomic % identity of the probe sequence to gene sequences for representative members of the Cornell, Victoria, and Pinellas groups of Dhc. The yellow-to-purple columns (right) represent the correlation relationship scores of the probe intensity across all cDNA pools from all samples. Bolded (*) probes indicate those that were retained for the REFS™ analysis of the KB-1® data.</p

The Francis Crick Institute

Transcripts in the consensus networks that are a maximum of two edges away from connecting with reductive dehalogenases in D2 (left) and KB-1® (right).

Author: Annette R. Rowe (1969924)
Boris Hayete (25337)
Bruce W. Church (3339357)
Cresten B. Mansfeldt (695827)
Gretchen W. Heavner (3339360)
Ruth E. Richardson (695829)
Publication venue
Publication date
Field of study

The four highest transcribed RDases in the D2 culture and the top five transcribed RDases in the KB-1® culture are displayed. Other RDases are present in the final consensus network as well. The dashed lines are indicative of negative relationships, and the solid lines represent positive relationships. For the D2 consensus network, the transcript ID and a brief description is provided. For the KB-1® consensus network, the probe ID, orthologous transcript ID in Dhc strain 195 (where applicable), and a brief annotation are provided. The grayed text in the KB-1® culture represents transcripts from a minor Cornell-type strain.</p

The Francis Crick Institute

Gene-gene edges modified by a discrete variable in the D2 consensus network with high frequencies (f > 0.85).

Author: Annette R. Rowe (1969924)
Boris Hayete (25337)
Bruce W. Church (3339357)
Cresten B. Mansfeldt (695827)
Gretchen W. Heavner (3339360)
Ruth E. Richardson (695829)
Publication venue
Publication date
Field of study

Gene-gene edges modified by a discrete variable in the D2 consensus network with high frequencies (f > 0.85).</p

The Francis Crick Institute

Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata

Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata

Predicting gene function using hierarchical multi-label decision tree ensembles

DENSE STRUCTURAL EXPECTATION MAXIMISATION WITH PARALLELISATION FOR EFFICIENT LARGE-NETWORK STRUCTURAL INFERENCE

Inferring Gene Networks for Strains of <i>Dehalococcoides</i> Highlights Conserved Relationships between Genes Encoding Core Catabolic and Cell-Wall Structural Proteins

REFS<sup>™</sup> consensus network summary for the <i>hup</i> and <i>fdh</i> transcripts.

Ordering <i>Dhc</i> pangenome array probes based on sequence similarity and captured expression profiles for the KB-1<sup>®</sup> culture.

Transcripts in the consensus networks that are a maximum of two edges away from connecting with reductive dehalogenases in D2 (left) and KB-1<sup>®</sup> (right).

Gene-gene edges modified by a discrete variable in the D2 consensus network with high frequencies (f > 0.85).