Search CORE

2,511 research outputs found

Predicting gene ontology from a global meta-analysis of 1-color microarray experiments

Author: Dozmorov Mikhail G
Giles Cory B
Wren Jonathan D
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Global meta-analysis (GMA) of microarray data to identify genes with highly similar co-expression profiles is emerging as an accurate method to predict gene function and phenotype, even in the absence of published data on the gene(s) being analyzed. With a third of human genes still uncharacterized, this approach is a promising way to direct experiments and rapidly understand the biological roles of genes. To predict function for genes of interest, GMA relies on a guilt-by-association approach to identify sets of genes with known functions that are consistently co-expressed with it across different experimental conditions, suggesting coordinated regulation for a specific biological purpose. Our goal here is to define how sample, dataset size and ranking parameters affect prediction performance. Results 13,000 human 1-color microarrays were downloaded from GEO for GMA analysis. Prediction performance was benchmarked by calculating the distance within the Gene Ontology (GO) tree between predicted function and annotated function for sets of 100 randomly selected genes. We find the number of new predicted functions rises as more datasets are added, but begins to saturate at a sample size of approximately 2,000 experiments. For the gene set used to predict function, we find precision to be higher with smaller set sizes, yet with correspondingly poor recall and, as set size is increased, recall and F-measure also tend to increase but at the cost of precision. Conclusions Of the 20,813 genes expressed in 50 or more experiments, at least one predicted GO category was found for 72.5% of them. Of the 5,720 genes without GO annotation, 4,189 had at least one predicted ontology using top 40 co-expressed genes for prediction analysis. For the remaining 1,531 genes without GO predictions or annotations, ~17% (257 genes) had sufficient co-expression data yet no statistically significantly overrepresented ontologies, suggesting their regulation may be more complex.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

High-throughput processing and normalization of one-color microarrays for transcriptional meta-analyses

Author: A Brazma
A Brazma
A Campain
BM Bolstad
D Ghosh
DR Rhodes
DR Rhodes
F Hong
GP Srivastava
HK Lee
I Dozmorov
J Hubble
JC Newman
JD Wren
JE Larkin
Jonathan D Wren
L Shi
L Shi
M Kapushesky
M Severgnini
MG Dozmorov
Mikhail G Dozmorov
P Cahan
P Cahan
PK Tan
RA Irizarry
T Bammler
T Barrett
T Konishi
W Fujibuchi
WC Cheng
X Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Microarray experiments are becoming increasingly common in biomedical research, as is their deposition in publicly accessible repositories, such as Gene Expression Omnibus (GEO). As such, there has been a surge in interest to use this microarray data for meta-analytic approaches, whether to increase sample size for a more powerful analysis of a specific disease (e.g. lung cancer) or to re-examine experiments for reasons different than those examined in the initial, publishing study that generated them. For the average biomedical researcher, there are a number of practical barriers to conducting such meta-analyses such as manually aggregating, filtering and formatting the data. Methods to automatically process large repositories of microarray data into a standardized, directly comparable format will enable easier and more reliable access to microarray data to conduct meta-analyses. Methods We present a straightforward, simple but robust against potential outliers method for automatic quality control and pre-processing of tens of thousands of single-channel microarray data files. GEO GDS files are quality checked by comparing parametric distributions and quantile normalized to enable direct comparison of expression level for subsequent meta-analyses. Results 13,000 human 1-color experiments were processed to create a single gene expression matrix that subsets can be extracted from to conduct meta-analyses. Interestingly, we found that when conducting a global meta-analysis of gene-gene co-expression patterns across all 13,000 experiments to predict gene function, normalization had minimal improvement over using the raw data. Conclusions Normalization of microarray data appears to be of minimal importance on analyses based on co-expression patterns when the sample size is on the order of thousands microarray datasets. Smaller subsets, however, are more prone to aberrations and artefacts, and effective means of automating normalization procedures not only empowers meta-analytic approaches, but aids in reproducibility by providing a standard way of approaching the problem. Data availability: matrix containing normalized expression of 20,813 genes across 13,000 experiments is available for download at . Source code for GDS files pre-processing is available from the authors upon request.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Semi-Supervised Method for Predicting Transcription Factor–Gene Interactions in Escherichia coli

Author: Gary Stormo
Gábor Balázsi
Jason Ernst
Krin A. Kay
Qasim K. Beg
Ziv Bar-Joseph
Zoltán N. Oltvai
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

While Escherichia coli has one of the most comprehensive datasets of experimentally verified transcriptional regulatory interactions of any organism, it is still far from complete. This presents a problem when trying to combine gene expression and regulatory interactions to model transcriptional regulatory networks. Using the available regulatory interactions to predict new interactions may lead to better coverage and more accurate models. Here, we develop SEREND (SEmi-supervised REgulatory Network Discoverer), a semi-supervised learning method that uses a curated database of verified transcriptional factor–gene interactions, DNA sequence binding motifs, and a compendium of gene expression data in order to make thousands of new predictions about transcription factor–gene interactions, including whether the transcription factor activates or represses the gene. Using genome-wide binding datasets for several transcription factors, we demonstrate that our semi-supervised classification strategy improves the prediction of targets for a given transcription factor. To further demonstrate the utility of our inferred interactions, we generated a new microarray gene expression dataset for the aerobic to anaerobic shift response in E. coli. We used our inferred interactions with the verified interactions to reconstruct a dynamic regulatory network for this response. The network reconstructed when using our inferred interactions was better able to correctly identify known regulators and suggested additional activators and repressors as having important roles during the aerobic–anaerobic shift interface

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

D-Scholarship@Pitt

Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

Author: Alvord W. G.
Auer P. L.
Chen Y.
Chen Z.
Cohen J.
Fechner G. T.
Guyon I.
Göhlmann H.
Lee J.
Li C.
Schwender H.
Smyth G. K.
Snedecor G. W.
Trevino V.
Vandesompele J.
Welsh B. L.
WENTIAN LI
Zhao C.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/08/2013
Field of study

Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

arXiv.org e-Print Archive

Crossref

Proceedings of the 2011 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference

Author: A Rawat
A Zhou
AA Markovets
C Bland
C Di
C Kandoth
CA Bottoms
D Ding
DJ Quest
Doris M Kupfer
DR Koessler
E Tjioe
Edward J Perkins
F Zhang
H Bisgin
H Fang
J Su
J Xu
JD Wren
Jonathan D Wren
JR Daum
KT Diedrich
M Ammari
M Chen
M Mete
MG Dozmorov
MG Dozmorov
Mikhail G Dozmorov
ML Mayo
MS Esfahani
MV Swamy
N Ghaffari
P Ghosh
P Manda
R Kumar
RY Kelley
S Achuthan
S Bai
S Kockara
S Roy
S Suer
S Winters-Hilt
SA Smits
SD Griffith
SJ Matthews
Stephen Winters-Hilt
Susan Bridges
T Halic
U Melcher
U Uzuner
Ulisses Braga-Neto
V Chaitankar
X Nan
X Qian
Y Jia
Y Li
Y Wang
Z Wen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Identification of transcription factor's targets using tissue-specific transcriptomic data in Arabidopsis thaliana

Author: A de la Fuente
A Wille
AP Bracken
B Mauch-Mani
Dong Xu
E Ramirez-Parra
E Segal
EI Boyle
F Markowetz
GD Bader
GP Srivastava
Gyan Prakash Srivastava
H Toh
J Kilian
J Schafer
J Schafer
JG Sørensen
Jingdong Liu
K Shinozaki
K Vandepoele
K Yugi
L Reiser
M Kasuga
M Schena
M Schmid
M Seki
MJ Buck
N Friedman
P Brazhnik
P Shannon
Ping Li
PT Spellman
R Mittler
R Opgen-Rhein
RJ Marinelli
RL Poole
S Ma
S Wichert
SK Palaniswamy
T Barrett
T Barrett
T Chen
TI Lee
V. SS Filkov
WR Swindell
X Xu
X Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Heterologous Tissue Culture Expression Signature Predicts Human Breast Cancer Prognosis

Author: A Alizadeh
A Budhu
A Degrassi
A Rosenwald
AH Bild
C Sotiriou
CM Perou
D Hanahan
DS Oh
Eun Sung Park
Fenghuang Zhan
HY Chang
HY Chang
Hyun Goo Woo
J. Frederic Mushinski
JD Brenton
Joanna H. Shih
John D. Shaughnessy
JS Lee
JS Lee
Ju-Seog Lee
LJ van't Veer
M Potter
M Potter
MD Radmacher
MJ van de Vijver
N Oue
NA Bhowmick
Oliver Hofmann
PL Fitzgibbons
R Kalluri
R Simon
R Simon
SK Gruvberger
T Sorlie
T Sorlie
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

BACKGROUND: Cancer patients have highly variable clinical outcomes owing to many factors, among which are genes that determine the likelihood of invasion and metastasis. This predisposition can be reflected in the gene expression pattern of the primary tumor, which may predict outcomes and guide the choice of treatment better than other clinical predictors. METHODOLOGY/PRINCIPAL FINDINGS: We developed an mRNA expression-based model that can predict prognosis/outcomes of human breast cancer patients regardless of microarray platform and patient group. Our model was developed using genes differentially expressed in mouse plasma cell tumors growing in vivo versus those growing in vitro. The prediction system was validated using published data from three cohorts of patients for whom microarray and clinical data had been compiled. The model stratified patients into four independent survival groups (BEST, GOOD, BAD, and WORST: log-rank test p = 1.7×10(−8)). CONCLUSIONS: Our model significantly improved the survival prediction over other expression-based models and permitted recognition of patients with different prognoses within the estrogen receptor-positive group and within a single pathological tumor class. Basing our predictor on a dataset that originated in a different species and a different cell type may have rendered it less sensitive to proliferation differences and endowed it with wide applicability. SIGNIFICANCE: Prognosis prediction for patients with breast cancer is currently based on histopathological typing and estrogen receptor positivity. Yet both assays define groups that are heterogeneous in survival. Gene expression profiling allows subdivision of these groups and recognition of patients whose tumors are very unlikely to be lethal and those with much grimmer outlooks, which can augment the predictive power of conventional tumor analysis and aid the clinician in choosing relaxed vs. aggressive therapy

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Yonsei University Medical Library Open Access Repository

PubMed Central

Bioinformatics resources for cancer research with an emphasis on gene function and structure prediction tools

Author: Hawkins Troy
Kihara Daisuke
Yang Yifeng David
Publication venue: Libertas Academica
Publication date: 01/01/2006
Field of study

The immensely popular fields of cancer research and bioinformatics overlap in many different areas, e.g. large data repositories that allow for users to analyze data from many experiments (data handling, databases), pattern mining, microarray data analysis, and interpretation of proteomics data. There are many newly available resources in these areas that may be unfamiliar to most cancer researchers wanting to incorporate bioinformatics tools and analyses into their work, and also to bioinformaticians looking for real data to develop and test algorithms. This review reveals the interdependence of cancer research and bioinformatics, and highlight the most appropriate and useful resources available to cancer researchers. These include not only public databases, but general and specific bioinformatics tools which can be useful to the cancer researcher. The primary foci are function and structure prediction tools of protein genes. The result is a useful reference to cancer researchers and bioinformaticians studying cancer alike

Directory of Open Access Journals

PubMed Central

Predicting genome-wide redundancy using machine learning

Author: Bandyopadhyay Sunayan
Birnbaum Kenneth D
Chen Huang-Wen
Shasha Dennis E
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as <it>Arabidopsis thaliana</it>, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in <it>Arabidopsis </it>showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for <it>Arabidopsis </it>provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Exploring Gene Regulatory Interaction Networks and predicting therapeutic molecules for Hypopharyngeal Cancer and EGFR-mutated lung adenocarcinoma

Author: Almoyad Muhammad Ali Abdulllah
Aryal Sunil
Azad AKM
Bhattacharjya Abanti
Islam Md Manowarul
Moni Mohammad Ali
Paul Bikash Kumar
Talukder Md. Alamin
Tasnim Wahia
Uddin Md Ashraf
Publication venue
Publication date: 27/02/2024
Field of study

With the advent of Information technology, the Bioinformatics research field is becoming increasingly attractive to researchers and academicians. The recent development of various Bioinformatics toolkits has facilitated the rapid processing and analysis of vast quantities of biological data for human perception. Most studies focus on locating two connected diseases and making some observations to construct diverse gene regulatory interaction networks, a forerunner to general drug design for curing illness. For instance, Hypopharyngeal cancer is a disease that is associated with EGFR-mutated lung adenocarcinoma. In this study, we select EGFR-mutated lung adenocarcinoma and Hypopharyngeal cancer by finding the Lung metastases in hypopharyngeal cancer. To conduct this study, we collect Mircorarray datasets from GEO (Gene Expression Omnibus), an online database controlled by NCBI. Differentially expressed genes, common genes, and hub genes between the selected two diseases are detected for the succeeding move. Our research findings have suggested common therapeutic molecules for the selected diseases based on 10 hub genes with the highest interactions according to the degree topology method and the maximum clique centrality (MCC). Our suggested therapeutic molecules will be fruitful for patients with those two diseases simultaneously.Comment: Accepted In The FEBS OPEN BIO (Q2, SCOPUS, SCIE, IF: 2.6, CS: 4.7), Wiley Journal, On FEB 25, 202

arXiv.org e-Print Archive