29 research outputs found

    Towards Mobility Data Science (Vision Paper)

    Full text link
    Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from the metadata. PDF has not been change

    Preference rules for label ranking: Mining patterns in multi-target relations

    Get PDF
    In this paper, we investigate two variants of association rules for preference data, Label Ranking Association Rules and Pairwise Association Rules. Label Ranking Association Rules (LRAR) are the equivalent of Class Association Rules (CAR) for the Label Ranking task. In CAR, the consequent is a single class, to which the example is expected to belong to. In LRAR, the consequent is a ranking of the labels. The generation of LRAR requires special support and confidence measures to assess the similarity of rankings. In this work, we carry out a sensitivity analysis of these similarity-based measures. We want to understand which datasets benefit more from such measures and which parameters have more influence in the accuracy of the model. Furthermore, we propose an alternative type of rules, the Pairwise Association Rules (PAR), which are defined as association rules with a set of pairwise preferences in the consequent. While PAR can be used both as descriptive and predictive models, they are essentially descriptive models. Experimental results show the potential of both approaches.This research has received funding from the ECSEL Joint Undertaking, the framework programme for research and innovation horizon 2020 (2014-2020) under grant agreement number 662189-MANTIS-2014-1, and by National Funds through the FCT — Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013

    Word-level human interpretable scoring mechanism for novel text detection using Tsetlin Machines

    Get PDF
    Recent research in novelty detection focuses mainly on document-level classification, employing deep neural networks (DNN). However, the black-box nature of DNNs makes it difficult to extract an exact explanation of why a document is considered novel. In addition, dealing with novelty at the word level is crucial to provide a more fine-grained analysis than what is available at the document level. In this work, we propose a Tsetlin Machine (TM)-based architecture for scoring individual words according to their contribution to novelty. Our approach encodes a description of the novel documents using the linguistic patterns captured by TM clauses. We then adapt this description to measure how much a word contributes to making documents novel. Our experimental results demonstrate how our approach breaks down novelty into interpretable phrases, successfully measuring novelty.publishedVersionPaid Open Acces

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Discovering critical traffic anomalies from GPS trajectories for urban traffic dynamics understanding

    Get PDF
    Traffic anomaly (e.g., traffic jams) detection is essential for the development of intelligent transportation systems in smart cities. In particular, detecting critical traffic anomalies (e.g., rare traffic anomalies, sudden accidents) are far more meaningful than detecting general traffic anomalies and more helpful to understand urban traffic dynamics. For example, emerging traffic jams are more significant than regular traffic jams caused by common road bottlenecks like traffic lights or toll road entrances;&amp;nbsp; and discovering the original location of traffic chaos in an area is more important than finding roads that are just congested. However, using existing traffic indicators that represent traffic conditions, such as traffic flows and speeds, for critical traffic anomaly detection may be not accurate enough. That is, they usually miss some traffic anomalies while wrongly identifying a normal traffic status as an anomaly. Moreover, most existing detection methods only detect general traffic anomalies but not critical traffic anomalies. In this thesis, we provide two new indicators: frequency of jams (captured by stop-point clusters) and Visible Outlier Indexes (VOIs) (based on the Kolmogorov-Smirnov test of speed) to capture critical traffic anomalies more accurately. The advantage of our proposed indicators is that they help separate critical traffic anomalies from general traffic anomalies. The former can discover rare anomalies with low frequency, and the latter can find unexpected anomalies (i.e., when the difference between the predicted VOI and the real VOI is great). Based on these two indicators, we provide three novel methods for comprehensive traffic anomaly analysis, including traffic anomaly identification, prediction, and root cause discovery. First, we provide a novel analysis of spatial-temporal jam frequencies (ASTJF) method for identifying rare traffic anomalies. In the ASTJF method, spatially close stop-points in a time bin are grouped into stop-point clusters (SPCs) using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm; an SPC is an instance of a spatiotemporal jam. Then, we develop a new adapted Hausdorff distance to measure the similarity of two SPCs and put SPCs which are relevant to the same spatiotemporal jam into a group. Finally, we calculate the number of SPCs in a group as the frequency of the corresponding traffic jams; traffic anomalies are classified as regular jams with high frequency and emerging jams with low frequency. The ASTJF method can correctly identify critical traffic anomalies (i.e., emerging jams). Second, we propose a novel prediction approach -&amp;nbsp; Visible Outlier Indices and Meshed Spatiotemporal Neighborhoods (VOIMSN) method. In this method, the trajectory data from the given region&#039;s geographic spatial neighbors and its time-series neighbors are both converted to the abnormal scores measured by VOIs and quantified by the matrix grid as the input of the prediction model to improve the accuracy. This method provides a comprehensive analysis using all relevant data for building a reliable prediction model. In particular, the proposed meshed spatiotemporal neighborhoods with arbitrary shape, which comprises all potential anomalies instead of just past anomalies, is theoretically more accurate than a fixed-size neighborhood for anomaly prediction. Third, we provide an innovative and integrated root cause analysis method using VOI as the probabilistic indicator of traffic anomalies. This method proposes automatically learns spatiotemporal causal relationships from historical data to build an uneven diffusion model for detecting the root cause of anomalies (i.e., the origin of traffic chaos). It is demonstrated to be better than the heat diffusion model. Experiments conducted on a real-world massive trajectory dataset demonstrate the accuracy and effectiveness of the proposed methods for discovering critical traffic anomalies for a better understanding of urban traffic dynamics
    corecore