External information propagates in the cell mainly through signaling cascades
and transcriptional activation, allowing it to react to a wide spectrum of
environmental changes. High throughput experiments identify numerous molecular
components of such cascades that may, however, interact through unknown
partners. Some of them may be detected using data coming from the integration
of a protein-protein interaction network and mRNA expression profiles. This
inference problem can be mapped onto the problem of finding appropriate optimal
connected subgraphs of a network defined by these datasets. The optimization
procedure turns out to be computationally intractable in general. Here we
present a new distributed algorithm for this task, inspired from statistical
physics, and apply this scheme to alpha factor and drug perturbations data in
yeast. We identify the role of the COS8 protein, a member of a gene family of
previously unknown function, and validate the results by genetic experiments.
The algorithm we present is specially suited for very large datasets, can run
in parallel, and can be adapted to other problems in systems biology. On
renowned benchmarks it outperforms other algorithms in the field.Comment: 6 pages, 3 figures, 1 table, Supporting Informatio