ARBUD: A Reusable Architecture for Building User Models from Massive Datasets
Abstract. In many situations, it is common that a large single source of data serves as input to multiple applications, each of which may use a different user model. It is often the case that each user model is created using a different process; however, in many cases it would more efficient to use a common architecture for building different user models in different application areas. In this paper, we propose a distributed-computing architecture based on MapReduce that allows for the efficient processing of massive datasets using reusable components that compute different features of the final user model. A metamodel is used for specifying the characteristics of the desired user model – which can include both shortterm and long-term user models – and the architecture is responsible for building the user model from the specified data and reusable components. We present an instantiation of the architecture in the context of telecommunications applications and empirically evaluate the scalability of the proposed architecture with a real dataset. Our results indicate that complex user models for millions of users can be obtained in just a few hours on a small computer cluster.