research

Scalable Inference of Gene Regulatory Networks with the Spark Distributed Computing Platform Cristo

Abstract

Inference of Gene Regulatory Networks (GRNs) remains an important open challenge in computational biology. The goal of bio-model inference is to, based on time-series of gene expression data, obtain the sparse topological structure and the parameters that quantitatively understand and reproduce the dynamics of biological system. Nevertheless, the inference of a GRN is a complex optimization problem that involve processing S-System models, which include large amount of gene expression data from hundreds (even thousands) of genes in multiple time-series (essays). This complexity, along with the amount of data managed, make the inference of GRNs to be a computationally expensive task. Therefore, the genera- tion of parallel algorithmic proposals that operate efficiently on distributed processing platforms is a must in current reconstruction of GRNs. In this paper, a parallel multi-objective approach is proposed for the optimal inference of GRNs, since min- imizing the Mean Squared Error using S-System model and Topology Regularization value. A flexible and robust multi-objective cellular evolutionary algorithm is adapted to deploy parallel tasks, in form of Spark jobs. The proposed approach has been developed using the framework jMetal, so in order to perform parallel computation, we use Spark on a cluster of distributed nodes to evaluate candidate solutions modeling the interactions of genes in biological networks.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Similar works