In recent years, machine learning has proven to be an
extremely useful tool for extracting knowledge from data.
This can be leveraged in numerous research areas, such as
genomics, earth sciences, and astrophysics, to gain valuable
insight. At the same time, Python has become one of the most
popular programming languages among researchers due to its
high productivity and rich ecosystem. Unfortunately, existing
machine learning libraries for Python do not scale to large data
sets, are hard to use by non-experts, and are difficult to set
up in high performance computing clusters. These limitations
have prevented scientists from exploiting the full potential of
machine learning in their research. In this work, we present
dislib [1], a distributed machine learning library on top of
PyCOMPSs programming model [2] that addresses the issues
of other similar existing libraries