1 research outputs found
Bioinformatics Computational Cluster Batch Task Profiling with Machine Learning for Failure Prediction
Motivation: Traditional computational cluster schedulers are based on user
inputs and run time needs request for memory and CPU, not IO. Heavily IO bound
task run times, like ones seen in many big data and bioinformatics problems,
are dependent on the IO subsystems scheduling and are problematic for cluster
resource scheduling. The problematic rescheduling of IO intensive and errant
tasks is a lost resource. Understanding the conditions in both successful and
failed tasks and differentiating them could provide knowledge to enhancing
cluster scheduling and intelligent resource optimization.
Results: We analyze a production computational cluster contributing 6.7
thousand CPU hours to research over two years. Through this analysis we develop
a machine learning task profiling agent for clusters that attempts to predict
failures between identically provision requested tasks