Development of high performance computing cluster for evaluation of sequence alignment algorithms

Abstract

As the biological databases are increasing rapidly, there is a challenge for both Biologists and Computer Scientists to develop algorithms and databases to manage the increasing data. There are many algorithms developed to align the sequences stored in biological databases - some take time to process the data while others are inefficient to produce reasonable results. As more data is generated, and time consuming algorithms are developed to handle them, there is a need for specialized computers to handle the computations. Researchers are typically limited by the computational power of their computers. High Performance Computing (HPC) field addresses this challenge and can be used in a cost-effective manner where there is no need for expensive equipment, instead old computers can be used together to form a powerful system. This is the premise of this research, wherein the setup of a low-cost Beowulf cluster is explored, with the subsequent evaluation of its performance for processing sequent alignment algorithms. A mixed method methodology is used in this dissertation, which consists of literature study, theoretical and practise based system. This mixed method methodology also have a proof and concept where the Beowulf cluster is designed and implemented to perform the sequence alignment algorithms and also the performance test. This dissertation firstly gives an overview of sequence alignment algorithms that are already developed and also highlights their timeline. A presentation of the design and implementation of the Beowulf Cluster is highlighted and this is followed by the experiments on the baseline performance of the cluster. A detailed timeline of the sequence alignment algorithms is given and also the comparison between ClustalW-MPI and T-Coffee (Tree-based Consistency Objective Function For alignment Evaluation) algorithm is presented as part of the findings in the research study. The efficiency of the cluster was observed to be 19.8%, this percentage is unexpected because the predicted efficiency is 83.3%, which is found in the theoretical cluster calculator. The theoretical performance of the cluster showed a high performance as compared with the experimental performance, this is attributable to the slow network, which was 100Mbps, low processor speed of 2.50 GHz, and low memory of 2 Gigabytes

    Similar works