Machine Learning with Time Series: A Taxonomy of Learning Tasks, Development of a Unified Framework, and Comparative Benchmarking of Algorithms

Abstract

Time series data is ubiquitous in real-world applications. Such data gives rise to distinct but closely related learning tasks (e.g. time series classification, regression or forecasting). In contrast to the more traditional cross-sectional setting, these tasks are often not fully formalized. As a result, different tasks can become conflated under the same name, algorithms are often applied to the wrong task, and performance estimates are are potentially unreliable. In practice, software frameworks such as scikit-learn have become essential tools for data science. However, most existing frameworks focus on cross-sectional data. To our know- ledge, no comparable frameworks exist for temporal data. Moreover, despite the importance of these framework, their design principles have never been fully understood. Instead, discussions often concentrate on the usage and features, while almost completely ignoring the design. To address these issues, we develop in this thesis (i) a formal taxonomy of learning tasks, (ii) novel design principles for ML toolboxes and (iii) a new unified framework for ML with time series. The framework has been implemented in an open-source Python package called sktime. The design principles are derived from existing state-of-the-art toolboxes and classical software design practices, using a domain-driven approach and a novel scientific type system. We show that these principles cannot just explain key aspects of existing frameworks, but also guide the development of new ones like sktime. Finally, we use sktime to reproduce and extend the M4 competition, one of the major comparative benchmarking studies for forecasting. Reproducing the competition allows us to verify the published results and illustrate sktime’s effectiveness. Extending the competition enables us to explore the potential of previously unstudied ML models. We find that, on a subset of the M4 data, simple ML models implemented in sktime can match the state-of-the-art performance of the hand-crafted M4 winner models

    Similar works