2,674 research outputs found
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams
The name entity disambiguation task aims to partition the records of multiple
real-life persons so that each partition contains records pertaining to a
unique person. Most of the existing solutions for this task operate in a batch
mode, where all records to be disambiguated are initially available to the
algorithm. However, more realistic settings require that the name
disambiguation task be performed in an online fashion, in addition to, being
able to identify records of new ambiguous entities having no preexisting
records. In this work, we propose a Bayesian non-exhaustive classification
framework for solving online name disambiguation task. Our proposed method uses
a Dirichlet process prior with a Normal * Normal * Inverse Wishart data model
which enables identification of new ambiguous entities who have no records in
the training data. For online classification, we use one sweep Gibbs sampler
which is very efficient and effective. As a case study we consider
bibliographic data in a temporal stream format and disambiguate authors by
partitioning their papers into homogeneous groups. Our experimental results
demonstrate that the proposed method is better than existing methods for
performing online name disambiguation task.Comment: to appear in CIKM 201
Discovering core terms for effective short text clustering
This thesis aims to address the current limitations in short texts clustering and provides a systematic framework that includes three novel methods to effectively measure similarity of two short texts, efficiently group short texts, and dynamically cluster short text streams
Trip Prediction by Leveraging Trip Histories from Neighboring Users
We propose a novel approach for trip prediction by analyzing user's trip
histories. We augment users' (self-) trip histories by adding 'similar' trips
from other users, which could be informative and useful for predicting future
trips for a given user. This also helps to cope with noisy or sparse trip
histories, where the self-history by itself does not provide a reliable
prediction of future trips. We show empirical evidence that by enriching the
users' trip histories with additional trips, one can improve the prediction
error by 15%-40%, evaluated on multiple subsets of the Nancy2012 dataset. This
real-world dataset is collected from public transportation ticket validations
in the city of Nancy, France. Our prediction tool is a central component of a
trip simulator system designed to analyze the functionality of public
transportation in the city of Nancy
Analysis of Large-Scale Traffic Dynamics in an Urban Transportation Network Using Non-Negative Tensor Factorization
International audienceIn this paper, we present our work on clustering and prediction of temporal evolution of global congestion configurations in a large-scale urban transportation network. Instead of looking into temporal variations of traffic flow states of individual links, we focus on temporal evolution of the complete spatial configuration of congestions over the network. In our work, we pursue to describe the typical temporal patterns of the global traffic states and achieve long-term prediction of the large-scale traffic evolution in a unified data-mining framework. To this end, we formulate this joint task using regularized Non-negative Tensor Factorization, which has been shown to be a useful analysis tool for spatio-temporal data sequences. Clustering and prediction are performed based on the compact tensor factorization results. The validity of the proposed spatio-temporal traffic data analysis method is shown on experiments using simulated realistic traffic data
Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach
The increasing availability of temporal network data is calling for more
research on extracting and characterizing mesoscopic structures in temporal
networks and on relating such structure to specific functions or properties of
the system. An outstanding challenge is the extension of the results achieved
for static networks to time-varying networks, where the topological structure
of the system and the temporal activity patterns of its components are
intertwined. Here we investigate the use of a latent factor decomposition
technique, non-negative tensor factorization, to extract the community-activity
structure of temporal networks. The method is intrinsically temporal and allows
to simultaneously identify communities and to track their activity over time.
We represent the time-varying adjacency matrix of a temporal network as a
three-way tensor and approximate this tensor as a sum of terms that can be
interpreted as communities of nodes with an associated activity time series. We
summarize known computational techniques for tensor decomposition and discuss
some quality metrics that can be used to tune the complexity of the factorized
representation. We subsequently apply tensor factorization to a temporal
network for which a ground truth is available for both the community structure
and the temporal activity patterns. The data we use describe the social
interactions of students in a school, the associations between students and
school classes, and the spatio-temporal trajectories of students over time. We
show that non-negative tensor factorization is capable of recovering the class
structure with high accuracy. In particular, the extracted tensor components
can be validated either as known school classes, or in terms of correlated
activity patterns, i.e., of spatial and temporal coincidences that are
determined by the known school activity schedule
- …