Industry 4.0 and Internet of Things (IoT) technologies unlock unprecedented
amount of data from factory production, posing big data challenges in volume
and variety. In that context, distributed computing solutions such as cloud
systems are leveraged to parallelise the data processing and reduce computation
time. As the cloud systems become increasingly popular, there is increased
demand that more users that were originally not cloud experts (such as data
scientists, domain experts) deploy their solutions on the cloud systems.
However, it is non-trivial to address both the high demand for cloud system
users and the excessive time required to train them. To this end, we propose
SemCloud, a semantics-enhanced cloud system, that couples cloud system with
semantic technologies and machine learning. SemCloud relies on domain
ontologies and mappings for data integration, and parallelises the semantic
data integration and data analysis on distributed computing nodes. Furthermore,
SemCloud adopts adaptive Datalog rules and machine learning for automated
resource configuration, allowing non-cloud experts to use the cloud system. The
system has been evaluated in industrial use case with millions of data,
thousands of repeated runs, and domain users, showing promising results.Comment: Paper accepted at ISWC2023 In-Use trac