The challenge with todays microarray experiments is to infer biological conclusions
from them. There are two crucial difficulties to be surmounted in this challenge:(1)
A lack of suitable biological repository that can be easily integrated into computational
algorithms. (2) Contemporary algorithms used to analyze microarray data are unable to
draw consistent biological results from diverse datasets of the same disease.
To deal with the first difficulty, we believe a core database that unifies available
biological repositories is important. Towards this end, we create a unified biological
database from three popular biological repositories (KEGG, Ingenuity and Wikipathways).
This database provides computer scientists the flexibility of easily integrating
biological information using simple API calls or SQL queries.
To deal with the second difficulty of deriving consistent biological results from the
experiments, we first conceptualize the notion of “subnetworks”, which refers to a
connected portion in a biological pathway. Then we propose a method that identifies
subnetworks that are consistently expressed by patients of he same disease phenotype.
We test our technique on independent datasets of several diseases, including ALL,
DMD and lung cancer. For each of these diseases, we obtain two independent microarray
datasets produced by distinct labs on distinct platforms. In each case, our technique
consistently produces overlapping lists of significant nontrivial subnetworks from two
independent sets of microarray data. The gene-level agreement of these significant
subnetworks is between 66.67% to 91.87%. In contrast, when the same pairs of
microarray datasets were analysed using GSEA and t-test, this percentage fell between
37% to 55.75% (GSEA) and between 2.55% to 19.23% (t-test). Furthermore, the genes
selected using GSEA and t-test do not form subnetworks of substantial size. Thus
it is more probable that the subnetworks selected by our technique can provide the
researcher with more descriptive information on the portions of the pathway which
actually associates with the disease.
Keywords: pathway analysis, microarra