1,779 research outputs found
Analysis of Large-scale Traffic Dynamics using Non-negative Tensor Factorization
International audienceIn this paper, we present our work on clustering and prediction of temporal dynamics of global congestion configurations in large-scale road networks. Instead of looking into temporal traffic state variation of individual links, or of small areas, we focus on spatial congestion configurations of the whole network. In our work, we aim at describing the typical temporal dynamic patterns of this network-level traffic state and achieving long-term prediction of the large-scale traffic dynamics, in a unified data-mining framework. To this end, we formulate this joint task using Non-negative Tensor Factorization (NTF), which has been shown to be a useful decomposition tools for multivariate data sequences. Clustering and prediction are performed based on the compact tensor factorization results. Experiments on large-scale simulated data illustrate the interest of our method with promising results for long-term forecast of traffic evolution
Analysis of Large-Scale Traffic Dynamics in an Urban Transportation Network Using Non-Negative Tensor Factorization
International audienceIn this paper, we present our work on clustering and prediction of temporal evolution of global congestion configurations in a large-scale urban transportation network. Instead of looking into temporal variations of traffic flow states of individual links, we focus on temporal evolution of the complete spatial configuration of congestions over the network. In our work, we pursue to describe the typical temporal patterns of the global traffic states and achieve long-term prediction of the large-scale traffic evolution in a unified data-mining framework. To this end, we formulate this joint task using regularized Non-negative Tensor Factorization, which has been shown to be a useful analysis tool for spatio-temporal data sequences. Clustering and prediction are performed based on the compact tensor factorization results. The validity of the proposed spatio-temporal traffic data analysis method is shown on experiments using simulated realistic traffic data
Scalable Tensor Factorizations for Incomplete Data
The problem of incomplete data - i.e., data with missing or unknown values -
in multi-way arrays is ubiquitous in biomedical signal processing, network
traffic analysis, bibliometrics, social network analysis, chemometrics,
computer vision, communication networks, etc. We consider the problem of how to
factorize data sets with missing values with the goal of capturing the
underlying latent structure of the data and possibly reconstructing missing
values (i.e., tensor completion). We focus on one of the most well-known tensor
factorizations that captures multi-linear structure, CANDECOMP/PARAFAC (CP). In
the presence of missing data, CP can be formulated as a weighted least squares
problem that models only the known entries. We develop an algorithm called
CP-WOPT (CP Weighted OPTimization) that uses a first-order optimization
approach to solve the weighted least squares problem. Based on extensive
numerical experiments, our algorithm is shown to successfully factorize tensors
with noise and up to 99% missing data. A unique aspect of our approach is that
it scales to sparse large-scale data, e.g., 1000 x 1000 x 1000 with five
million known entries (0.5% dense). We further demonstrate the usefulness of
CP-WOPT on two real-world applications: a novel EEG (electroencephalogram)
application where missing data is frequently encountered due to disconnections
of electrodes and the problem of modeling computer network traffic where data
may be absent due to the expense of the data collection process
Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data
How do humans move around in the urban space and how do they differ when the city undergoes terrorist attacks? How do users behave in Massive Open Online courses~(MOOCs) and how do they differ if some of them achieve certificates while some of them not? What areas in the court elite players, such as Stephen Curry, LeBron James, like to make their shots in the course of the game? How can we uncover the hidden habits that govern our online purchases? Are there unspoken agendas in how different states pass legislation of certain kinds? At the heart of these seemingly unconnected puzzles is this same mystery of multi-aspect mining, i.g., how can we mine and interpret the hidden pattern from a dataset that simultaneously reveals the associations, or changes of the associations, among various aspects of the data (e.g., a shot could be described with three aspects, player, time of the game, and area in the court)? Solving this problem could open gates to a deep understanding of underlying mechanisms for many real-world phenomena. While much of the research in multi-aspect mining contribute broad scope of innovations in the mining part, interpretation of patterns from the perspective of users (or domain experts) is often overlooked. Questions like what do they require for patterns, how good are the patterns, or how to read them, have barely been addressed. Without efficient and effective ways of involving users in the process of multi-aspect mining, the results are likely to lead to something difficult for them to comprehend.
This dissertation proposes the M^3 framework, which consists of multiplex pattern discovery, multifaceted pattern evaluation, and multipurpose pattern presentation, to tackle the challenges of multi-aspect pattern discovery. Based on this framework, we develop algorithms, applications, and analytic systems to enable interpretable pattern discovery from multi-aspect data. Following the concept of meaningful multiplex pattern discovery, we propose PairFac to close the gap between human information needs and naive mining optimization. We demonstrate its effectiveness in the context of impact discovery in the aftermath of urban disasters. We develop iDisc to target the crossing of multiplex pattern discovery with multifaceted pattern evaluation. iDisc meets the specific information need in understanding multi-level, contrastive behavior patterns. As an example, we use iDisc to predict student performance outcomes in Massive Open Online Courses given users' latent behaviors. FacIt is an interactive visual analytic system that sits at the intersection of all three components and enables for interpretable, fine-tunable, and scrutinizable pattern discovery from multi-aspect data. We demonstrate each work's significance and implications in its respective problem context. As a whole, this series of studies is an effort to instantiate the M^3 framework and push the field of multi-aspect mining towards a more human-centric process in real-world applications
An Incomplete Tensor Tucker decomposition based Traffic Speed Prediction Method
In intelligent transport systems, it is common and inevitable with missing
data. While complete and valid traffic speed data is of great importance to
intelligent transportation systems. A latent factorization-of-tensors (LFT)
model is one of the most attractive approaches to solve missing traffic data
recovery due to its well-scalability. A LFT model achieves optimization usually
via a stochastic gradient descent (SGD) solver, however, the SGD-based LFT
suffers from slow convergence. To deal with this issue, this work integrates
the unique advantages of the proportional-integral-derivative (PID) controller
into a Tucker decomposition based LFT model. It adopts two-fold ideas: a)
adopting tucker decomposition to build a LFT model for achieving a better
recovery accuracy. b) taking the adjusted instance error based on the PID
control theory into the SGD solver to effectively improve convergence rate. Our
experimental studies on two major city traffic road speed datasets show that
the proposed model achieves significant efficiency gain and highly competitive
prediction accuracy
- …