2,674 research outputs found

    Graph Summarization

    Full text link
    The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

    Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams

    Get PDF
    The name entity disambiguation task aims to partition the records of multiple real-life persons so that each partition contains records pertaining to a unique person. Most of the existing solutions for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task be performed in an online fashion, in addition to, being able to identify records of new ambiguous entities having no preexisting records. In this work, we propose a Bayesian non-exhaustive classification framework for solving online name disambiguation task. Our proposed method uses a Dirichlet process prior with a Normal * Normal * Inverse Wishart data model which enables identification of new ambiguous entities who have no records in the training data. For online classification, we use one sweep Gibbs sampler which is very efficient and effective. As a case study we consider bibliographic data in a temporal stream format and disambiguate authors by partitioning their papers into homogeneous groups. Our experimental results demonstrate that the proposed method is better than existing methods for performing online name disambiguation task.Comment: to appear in CIKM 201

    Discovering core terms for effective short text clustering

    Get PDF
    This thesis aims to address the current limitations in short texts clustering and provides a systematic framework that includes three novel methods to effectively measure similarity of two short texts, efficiently group short texts, and dynamically cluster short text streams

    Trip Prediction by Leveraging Trip Histories from Neighboring Users

    Full text link
    We propose a novel approach for trip prediction by analyzing user's trip histories. We augment users' (self-) trip histories by adding 'similar' trips from other users, which could be informative and useful for predicting future trips for a given user. This also helps to cope with noisy or sparse trip histories, where the self-history by itself does not provide a reliable prediction of future trips. We show empirical evidence that by enriching the users' trip histories with additional trips, one can improve the prediction error by 15%-40%, evaluated on multiple subsets of the Nancy2012 dataset. This real-world dataset is collected from public transportation ticket validations in the city of Nancy, France. Our prediction tool is a central component of a trip simulator system designed to analyze the functionality of public transportation in the city of Nancy

    Analysis of Large-Scale Traffic Dynamics in an Urban Transportation Network Using Non-Negative Tensor Factorization

    No full text
    International audienceIn this paper, we present our work on clustering and prediction of temporal evolution of global congestion configurations in a large-scale urban transportation network. Instead of looking into temporal variations of traffic flow states of individual links, we focus on temporal evolution of the complete spatial configuration of congestions over the network. In our work, we pursue to describe the typical temporal patterns of the global traffic states and achieve long-term prediction of the large-scale traffic evolution in a unified data-mining framework. To this end, we formulate this joint task using regularized Non-negative Tensor Factorization, which has been shown to be a useful analysis tool for spatio-temporal data sequences. Clustering and prediction are performed based on the compact tensor factorization results. The validity of the proposed spatio-temporal traffic data analysis method is shown on experiments using simulated realistic traffic data

    Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach

    Full text link
    The increasing availability of temporal network data is calling for more research on extracting and characterizing mesoscopic structures in temporal networks and on relating such structure to specific functions or properties of the system. An outstanding challenge is the extension of the results achieved for static networks to time-varying networks, where the topological structure of the system and the temporal activity patterns of its components are intertwined. Here we investigate the use of a latent factor decomposition technique, non-negative tensor factorization, to extract the community-activity structure of temporal networks. The method is intrinsically temporal and allows to simultaneously identify communities and to track their activity over time. We represent the time-varying adjacency matrix of a temporal network as a three-way tensor and approximate this tensor as a sum of terms that can be interpreted as communities of nodes with an associated activity time series. We summarize known computational techniques for tensor decomposition and discuss some quality metrics that can be used to tune the complexity of the factorized representation. We subsequently apply tensor factorization to a temporal network for which a ground truth is available for both the community structure and the temporal activity patterns. The data we use describe the social interactions of students in a school, the associations between students and school classes, and the spatio-temporal trajectories of students over time. We show that non-negative tensor factorization is capable of recovering the class structure with high accuracy. In particular, the extracted tensor components can be validated either as known school classes, or in terms of correlated activity patterns, i.e., of spatial and temporal coincidences that are determined by the known school activity schedule
    • …
    corecore