1,204 research outputs found

    Clustering Customer Shopping Trips With Network Structure

    Get PDF
    Moving objects can be tracked with sensors such as RFID tags or GPS devices. Their movement can be represented as sequences of time-stamped locations. Studying such spatio-temporal movement sequences to discover spatial sequential patterns holds promises in many real-world settings. A few interesting applications are customer shopping traverse pattern discovery, vehicle traveling pattern discovery, and route prediction. Traditional spatial data mining algorithms suitable for the Euclidean space are not directly applicable in these settings. We propose a new algorithm to cluster movement paths such as shopping trips for pattern discovery. In our work, we represent the spatio-temporal series as sequences of discrete locations following a pre-defined network. We incorporate a modified version of the Longest Common Subsequence (LCS) algorithm with the network structure to measure the similarity of movement paths. With such spatial networks we implicitly address the existence of spatial obstructs as well. Experiments were performed on both hand-collected real-life trips and simulated trips in grocery shopping. The initial evaluation results show that our proposed approach, called Net-LCSS, can be used to support effective and efficient clustering for shopping trip pattern discovery

    Estimating Movement from Mobile Telephony Data

    Get PDF
    Mobile enabled devices are ubiquitous in modern society. The information gathered by their normal service operations has become one of the primary data sources used in the understanding of human mobility, social connection and information transfer. This thesis investigates techniques that can extract useful information from anonymised call detail records (CDR). CDR consist of mobile subscriber data related to people in connection with the network operators, the nature of their communication activity (voice, SMS, data, etc.), duration of the activity and starting time of the activity and servicing cell identification numbers of both the sender and the receiver when available. The main contributions of the research are a methodology for distance measurements which enables the identification of mobile subscriber travel paths and a methodology for population density estimation based on significant mobile subscriber regions of interest. In addition, insights are given into how a mobile network operator may use geographically located subscriber data to create new revenue streams and improved network performance. A range of novel algorithms and techniques underpin the development of these methodologies. These include, among others, techniques for CDR feature extraction, data visualisation and CDR data cleansing. The primary data source used in this body of work was the CDR of Meteor, a mobile network operator in the Republic of Ireland. The Meteor network under investigation has just over 1 million customers, which represents approximately a quarter of the country’s 4.6 million inhabitants, and operates using both 2G and 3G cellular telephony technologies. Results show that the steady state vector analysis of modified Markov chain mobility models can return population density estimates comparable to population estimates obtained through a census. Evaluated using a test dataset, results of travel path identification showed that developed distance measurements achieved greater accuracy when classifying the routes CDR journey trajectories took compared to traditional trajectory distance measurements. Results from subscriber segmentation indicate that subscribers who have perceived similar relationships to geographical features can be grouped based on weighted steady state mobility vectors. Overall, this thesis proposes novel algorithms and techniques for the estimation of movement from mobile telephony data addressing practical issues related to sampling, privacy and spatial uncertainty

    Human Motion Trajectory Prediction: A Survey

    Full text link
    With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

    Density-based algorithms for active and anytime clustering

    Get PDF
    Data intensive applications like biology, medicine, and neuroscience require effective and efficient data mining technologies. Advanced data acquisition methods produce a constantly increasing volume and complexity. As a consequence, the need of new data mining technologies to deal with complex data has emerged during the last decades. In this thesis, we focus on the data mining task of clustering in which objects are separated in different groups (clusters) such that objects inside a cluster are more similar than objects in different clusters. Particularly, we consider density-based clustering algorithms and their applications in biomedicine. The core idea of the density-based clustering algorithm DBSCAN is that each object within a cluster must have a certain number of other objects inside its neighborhood. Compared with other clustering algorithms, DBSCAN has many attractive benefits, e.g., it can detect clusters with arbitrary shape and is robust to outliers, etc. Thus, DBSCAN has attracted a lot of research interest during the last decades with many extensions and applications. In the first part of this thesis, we aim at developing new algorithms based on the DBSCAN paradigm to deal with the new challenges of complex data, particularly expensive distance measures and incomplete availability of the distance matrix. Like many other clustering algorithms, DBSCAN suffers from poor performance when facing expensive distance measures for complex data. To tackle this problem, we propose a new algorithm based on the DBSCAN paradigm, called Anytime Density-based Clustering (A-DBSCAN), that works in an anytime scheme: in contrast to the original batch scheme of DBSCAN, the algorithm A-DBSCAN first produces a quick approximation of the clustering result and then continuously refines the result during the further run. Experts can interrupt the algorithm, examine the results, and choose between (1) stopping the algorithm at any time whenever they are satisfied with the result to save runtime and (2) continuing the algorithm to achieve better results. Such kind of anytime scheme has been proven in the literature as a very useful technique when dealing with time consuming problems. We also introduced an extended version of A-DBSCAN called A-DBSCAN-XS which is more efficient and effective than A-DBSCAN when dealing with expensive distance measures. Since DBSCAN relies on the cardinality of the neighborhood of objects, it requires the full distance matrix to perform. For complex data, these distances are usually expensive, time consuming or even impossible to acquire due to high cost, high time complexity, noisy and missing data, etc. Motivated by these potential difficulties of acquiring the distances among objects, we propose another approach for DBSCAN, called Active Density-based Clustering (Act-DBSCAN). Given a budget limitation B, Act-DBSCAN is only allowed to use up to B pairwise distances ideally to produce the same result as if it has the entire distance matrix at hand. The general idea of Act-DBSCAN is that it actively selects the most promising pairs of objects to calculate the distances between them and tries to approximate as much as possible the desired clustering result with each distance calculation. This scheme provides an efficient way to reduce the total cost needed to perform the clustering. Thus it limits the potential weakness of DBSCAN when dealing with the distance sparseness problem of complex data. As a fundamental data clustering algorithm, density-based clustering has many applications in diverse fields. In the second part of this thesis, we focus on an application of density-based clustering in neuroscience: the segmentation of the white matter fiber tracts in human brain acquired from Diffusion Tensor Imaging (DTI). We propose a model to evaluate the similarity between two fibers as a combination of structural similarity and connectivity-related similarity of fiber tracts. Various distance measure techniques from fields like time-sequence mining are adapted to calculate the structural similarity of fibers. Density-based clustering is used as the segmentation algorithm. We show how A-DBSCAN and A-DBSCAN-XS are used as novel solutions for the segmentation of massive fiber datasets and provide unique features to assist experts during the fiber segmentation process.Datenintensive Anwendungen wie Biologie, Medizin und Neurowissenschaften erfordern effektive und effiziente Data-Mining-Technologien. Erweiterte Methoden der Datenerfassung erzeugen stetig wachsende Datenmengen und Komplexit\"at. In den letzten Jahrzehnten hat sich daher ein Bedarf an neuen Data-Mining-Technologien f\"ur komplexe Daten ergeben. In dieser Arbeit konzentrieren wir uns auf die Data-Mining-Aufgabe des Clusterings, in der Objekte in verschiedenen Gruppen (Cluster) getrennt werden, so dass Objekte in einem Cluster untereinander viel \"ahnlicher sind als Objekte in verschiedenen Clustern. Insbesondere betrachten wir dichtebasierte Clustering-Algorithmen und ihre Anwendungen in der Biomedizin. Der Kerngedanke des dichtebasierten Clustering-Algorithmus DBSCAN ist, dass jedes Objekt in einem Cluster eine bestimmte Anzahl von anderen Objekten in seiner Nachbarschaft haben muss. Im Vergleich mit anderen Clustering-Algorithmen hat DBSCAN viele attraktive Vorteile, zum Beispiel kann es Cluster mit beliebiger Form erkennen und ist robust gegen\"uber Ausrei{\ss}ern. So hat DBSCAN in den letzten Jahrzehnten gro{\ss}es Forschungsinteresse mit vielen Erweiterungen und Anwendungen auf sich gezogen. Im ersten Teil dieser Arbeit wollen wir auf die Entwicklung neuer Algorithmen eingehen, die auf dem DBSCAN Paradigma basieren, um mit den neuen Herausforderungen der komplexen Daten, insbesondere teurer Abstandsma{\ss}e und unvollst\"andiger Verf\"ugbarkeit der Distanzmatrix umzugehen. Wie viele andere Clustering-Algorithmen leidet DBSCAN an schlechter Per- formanz, wenn es teuren Abstandsma{\ss}en f\"ur komplexe Daten gegen\"uber steht. Um dieses Problem zu l\"osen, schlagen wir einen neuen Algorithmus vor, der auf dem DBSCAN Paradigma basiert, genannt Anytime Density-based Clustering (A-DBSCAN), der mit einem Anytime Schema funktioniert. Im Gegensatz zu dem urspr\"unglichen Schema DBSCAN, erzeugt der Algorithmus A-DBSCAN zuerst eine schnelle Ann\"aherung des Clusterings-Ergebnisses und verfeinert dann kontinuierlich das Ergebnis im weiteren Verlauf. Experten k\"onnen den Algorithmus unterbrechen, die Ergebnisse pr\"ufen und w\"ahlen zwischen (1) Anhalten des Algorithmus zu jeder Zeit, wann immer sie mit dem Ergebnis zufrieden sind, um Laufzeit sparen und (2) Fortsetzen des Algorithmus, um bessere Ergebnisse zu erzielen. Eine solche Art eines "Anytime Schemas" ist in der Literatur als eine sehr n\"utzliche Technik erprobt, wenn zeitaufwendige Problemen anfallen. Wir stellen auch eine erweiterte Version von A-DBSCAN als A-DBSCAN-XS vor, die effizienter und effektiver als A-DBSCAN beim Umgang mit teuren Abstandsma{\ss}en ist. Da DBSCAN auf der Kardinalit\"at der Nachbarschaftsobjekte beruht, ist es notwendig, die volle Distanzmatrix auszurechen. F\"ur komplexe Daten sind diese Distanzen in der Regel teuer, zeitaufwendig oder sogar unm\"oglich zu errechnen, aufgrund der hohen Kosten, einer hohen Zeitkomplexit\"at oder verrauschten und fehlende Daten. Motiviert durch diese m\"oglichen Schwierigkeiten der Berechnung von Entfernungen zwischen Objekten, schlagen wir einen anderen Ansatz f\"ur DBSCAN vor, namentlich Active Density-based Clustering (Act-DBSCAN). Bei einer Budgetbegrenzung B, darf Act-DBSCAN nur bis zu B ideale paarweise Distanzen verwenden, um das gleiche Ergebnis zu produzieren, wie wenn es die gesamte Distanzmatrix zur Hand h\"atte. Die allgemeine Idee von Act-DBSCAN ist, dass es aktiv die erfolgversprechendsten Paare von Objekten w\"ahlt, um die Abst\"ande zwischen ihnen zu berechnen, und versucht, sich so viel wie m\"oglich dem gew\"unschten Clustering mit jeder Abstandsberechnung zu n\"ahern. Dieses Schema bietet eine effiziente M\"oglichkeit, die Gesamtkosten der Durchf\"uhrung des Clusterings zu reduzieren. So schr\"ankt sie die potenzielle Schw\"ache des DBSCAN beim Umgang mit dem Distance Sparseness Problem von komplexen Daten ein. Als fundamentaler Clustering-Algorithmus, hat dichte-basiertes Clustering viele Anwendungen in den unterschiedlichen Bereichen. Im zweiten Teil dieser Arbeit konzentrieren wir uns auf eine Anwendung des dichte-basierten Clusterings in den Neurowissenschaften: Die Segmentierung der wei{\ss}en Substanz bei Faserbahnen im menschlichen Gehirn, die vom Diffusion Tensor Imaging (DTI) erfasst werden. Wir schlagen ein Modell vor, um die \"Ahnlichkeit zwischen zwei Fasern als einer Kombination von struktureller und konnektivit\"atsbezogener \"Ahnlichkeit von Faserbahnen zu beurteilen. Verschiedene Abstandsma{\ss}e aus Bereichen wie dem Time-Sequence Mining werden angepasst, um die strukturelle \"Ahnlichkeit von Fasern zu berechnen. Dichte-basiertes Clustering wird als Segmentierungsalgorithmus verwendet. Wir zeigen, wie A-DBSCAN und A-DBSCAN-XS als neuartige L\"osungen f\"ur die Segmentierung von sehr gro{\ss}en Faserdatens\"atzen verwendet werden, und bieten innovative Funktionen, um Experten w\"ahrend des Fasersegmentierungsprozesses zu unterst\"utzen

    Detecting abnormal events on binary sensors in smart home environments

    Get PDF
    With a rising ageing population, smart home technologies have been demonstrated as a promising paradigm to enable technology-driven healthcare delivery. Smart home technologies, composed of advanced sensing, computing, and communication technologies, offer an unprecedented opportunity to keep track of behaviours and activities of the elderly and provide context-aware services that enable the elderly to remain active and independent in their own homes. However, experiments in developed prototypes demonstrate that abnormal sensor events hamper the correct identification of critical (and potentially life-threatening) situations, and that existing learning, estimation, and time-based approaches to situation recognition are inaccurate and inflexible when applied to multiple people sharing a living space. We propose a novel technique, called CLEAN, that integrates the semantics of sensor readings with statistical outlier detection. We evaluate the technique against four real-world datasets across different environments including the datasets with multiple residents. The results have shown that CLEAN can successfully detect sensor anomaly and improve activity recognition accuracies.PostprintPeer reviewe

    Deep Time-Series Clustering: A Review

    Get PDF
    We present a comprehensive, detailed review of time-series data analysis, with emphasis on deep time-series clustering (DTSC), and a case study in the context of movement behavior clustering utilizing the deep clustering method. Specifically, we modified the DCAE architectures to suit time-series data at the time of our prior deep clustering work. Lately, several works have been carried out on deep clustering of time-series data. We also review these works and identify state-of-the-art, as well as present an outlook on this important field of DTSC from five important perspectives

    Development of methodologies to analyze and visualize air traffic controllers’ visual scanning strategies

    Get PDF
    The Federal Aviation Administration (FAA) estimates a 60 million air traffic volume by 2040. However, the available workforce of expert air traffic controllers (ATCs) might not be sufficient to manage this anticipated high traffic volume. Thus, to maintain the same safety standard and service level for air travel, more ATCs will need to be trained quickly. Previous research shows eye tracking technology can be used to enhance the training of the ATC’s by reducing their false alarm rate, thereby helping to mitigate the impact of increasing demand. Methods need to be developed to better understand experts’ eye movement (EM) data so as to incorporate them effectively in ATCs’ training process. However, it’s challenging to analyze ATCs’ EM data for several factors: (i) aircraft representation on radar display (i.e. targets) are dynamic, as their shape and position changes with time; (ii) raw EM data is very complex to visualize, even for the meaningful small duration (e.g. task completion time of 1 min); (iii) in the absence of any predefined order of visual scanning, each ATC employ a variety of scanning strategies to manage traffic, making it challenging to extract relevant patterns that can be taught. To address these aforementioned issues, a threefold framework was developed: (i) a dynamic network-based approach that can map expert ATCs’ EM data to dynamic targets, enabling the representation of visual scanning strategy evolution with time; (ii) a novel density-based clustering method to reduce the inherent complexity of ATCs’ raw EM data to enhance its visualization; (iii) a new modified n-gram based similarity analysis method, to evaluate the consistency and similarity of visual scanning strategies among experts. Two different experiments were conducted at the FAA Civil Aerospace Medical Institute in Oklahoma City, where EM data of 15 veteran ATCs’ (> 20 years of experience) were collected using eye trackers (Facelab and Tobii eye trackers), while they were controlling a high-fidelity simulated air traffic. The first experiment involved en-route traffic scenario (with aircraft above 18,000 feet) and the second experiment consisted of airport tower traffic (aircraft within 30 miles radius from an airport). The dynamic network analysis showed three important results: (i) it can be used to effectively represent which are the important targets and how their significance evolves over time, (ii) in dynamic scenarios, having targets having variable time on display, traditional target importance measure (i.e. the number of eye fixations and duration) can be misleading, and (iii) importance measures derived from the network-based approach (e.g. closeness, betweenness) can be used to understand how ATCs’ visual attention moves between targets. The result from the density-based clustering method shows that by controlling its two parameter values(i.e. spatial and temporal approximation), the visualization of the raw EM data can be substantially simplified. This approximate representation can be used for better training purpose where expert ATC’s visual scanning strategy can be visualized with reduced complexity, thereby enhancing the understanding of novices while maintaining its significant pattern (key for visual pattern mining). Moreover, the model parameters enable the decision-maker to incorporate context-dependent factors by adjusting the spatial (in pixel) and temporal (in milliseconds) thresholds used for the visual scanning approximation. The modified n-gram approach allows for twofold similarity analysis of EM data: (i) detecting similar EM patterns due to exact sequential match in which the targets are focused and/or grouped together visually because of several eye fixation transitions among them, and (ii) unearth similar visual scanning behavior which is otherwise small perturbed version of each other that arise as a result of idiosyncrasies of ATCs. Thus, this method is more robust compared to other prevalent approaches which employ strict definitions for similarity that are difficult to empirically observe in real-life scenarios. To summarize, the three methods developed allow us to apply a comprehensible framework to understand the evolving nature of the visual scanning strategy in complex environments (e.g. air traffic control task) by: (i) by identifying target importance & their evolution; (ii) simplifying visualizing of complex EM strategy for easier comprehension; (iii) evaluating similarity among various visual scanning strategies in dynamic scenarios
    • 

    corecore