Search CORE

4 research outputs found

Colossal Trajectory Mining: A unifying approach to mine behavioral mobility patterns

Author: Francia M.
Gallinucci E.
Golfarelli M.
Publication venue
Publication date: 01/01/2024
Field of study

Spatio-temporal mobility patterns are at the core of strategic applications such as urban planning and monitoring. Depending on the strength of spatio-temporal constraints, different mobility patterns can be defined. While existing approaches work well in the extraction of groups of objects sharing fine-grained paths, the huge volume of large-scale data asks for coarse-grained solutions. In this paper, we introduce Colossal Trajectory Mining (CTM) to efficiently extract heterogeneous mobility patterns out of a multidimensional space that, along with space and time dimensions, can consider additional trajectory features (e.g., means of transport or activity) to characterize behavioral mobility patterns. The algorithm is natively designed in a distributed fashion, and the experimental evaluation shows its scalability with respect to the involved features and the cardinality of the trajectory dataset

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Data Stream Clustering: A Review

Author: Atalay Volkan
Zubaroğlu Alaettin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2020
Field of study

Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-time data stream processing, because it can be applied with less prior information about the data and it does not need labeled instances. However, data stream clustering differs from traditional clustering in many aspects and it has several challenging issues. Here, we provide information regarding the concepts and common characteristics of data streams, such as concept drift, data structures for data streams, time window models and outlier detection. We comprehensively review recent data stream clustering algorithms and analyze them in terms of the base clustering technique, computational complexity and clustering accuracy. A comparison of these algorithms is given along with still open problems. We indicate popular data stream repositories and datasets, stream processing tools and platforms. Open problems about data stream clustering are also discussed.Comment: Has been accepted for publication in Artificial Intelligence Revie

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

Recommended from our members

Real-time pre-processing technique for drift detection, feature tracking, and feature selection using adaptive micro-clusters for data stream classification

Author: Hammoodi Mahmood Shakir
Publication venue
Publication date: 01/01/2018
Field of study

Data streams are unbounded, sequential data instances that are generated with high Velocity. Data streams arrive online (i.e., instance by instance) and there is no control over the order in which data instances arrive either within a data stream or across data streams. Classifying sequential data instances is a challenging problem in machine learning with applications in network intrusion detection, financial markets and sensor networks. The automatic labelling of unseen instances from the stream in real-time is the main challenge that data stream classification faces. For this, the classifier needs to adapt to concept drifts and can only have a single-pass through the data with a limited amount of memory if the stream is generating data instances at a high Velocity. Nowadays the focus of Data Stream Mining (DSM) lies in the development of data mining algorithms rather than on pre-processing techniques. To the best of the author knowledge, at present, there are no developments for truly real-time feature selection in a streaming setting. This research work presents a real-time pre-processing technique, in particular, feature tracking in combination with concept drift detection. The feature tracking is designed to improve DSM classification algorithms by enabling real-time feature selection. The pre-processing technique is based on tracking adaptive statistical summaries of the data and class label distributions, known as Micro-Clusters. Thus the three objectives of this research were to develop a real-time pre-processing technique that can (1) detect a concept drift, (2) identify features that were involved in concept drift and thus potentially change their relevance and (3) build a real-time feature selection method based on the developments mentioned above. The evaluation of the developed technique is based on artificial data streams with known ground truth and real datasets with and without artificially induced concept drift (i.e., controlled and uncontrolled real datasets). It was observed that the developed method for concept drift detection did detect induced concept drifts very well compared with alternative concept drift detection methods. Overall the research represents a first attempt to resolve real-time feature selection for DSM tasks. It has been shown that the technique can indeed identify concept drift, track features, and identify features that may have changed their relevance for the DSM task in real-time. It has also been shown that the developed method for real-time feature selection can improve the accuracy of data stream classification tasks

Central Archive at the University of Reading