768 research outputs found
SamBaTen: Sampling-based Batch Incremental Tensor Decomposition
Tensor decompositions are invaluable tools in analyzing multimodal datasets.
In many real-world scenarios, such datasets are far from being static, to the
contrary they tend to grow over time. For instance, in an online social network
setting, as we observe new interactions over time, our dataset gets updated in
its "time" mode. How can we maintain a valid and accurate tensor decomposition
of such a dynamically evolving multimodal dataset, without having to re-compute
the entire decomposition after every single update? In this paper we introduce
SaMbaTen, a Sampling-based Batch Incremental Tensor Decomposition algorithm,
which incrementally maintains the decomposition given new updates to the tensor
dataset. SaMbaTen is able to scale to datasets that the state-of-the-art in
incremental tensor decomposition is unable to operate on, due to its ability to
effectively summarize the existing tensor and the incoming updates, and perform
all computations in the reduced summary space. We extensively evaluate SaMbaTen
using synthetic and real datasets. Indicatively, SaMbaTen achieves comparable
accuracy to state-of-the-art incremental and non-incremental techniques, while
being 25-30 times faster. Furthermore, SaMbaTen scales to very large sparse and
dense dynamically evolving tensors of dimensions up to 100K x 100K x 100K where
state-of-the-art incremental approaches were not able to operate
Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media
Real-time social systems like Facebook, Twitter, and Snapchat have been growing
rapidly, producing exabytes of data in different views or aspects. Coupled with more
and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable
information regarding “who”, “where”, “when” and “what”, these real-time human
sensor data promise new research opportunities to uncover models of user behavior, mobility,
and information sharing. These real-time dynamics in social systems usually come
in multiple aspects, which are able to help better understand the social interactions of the
underlying network. However, these multi-aspect datasets are often raw and incomplete
owing to various unpredictable or unavoidable reasons; for instance, API limitations and
data sampling policies can lead to an incomplete (and often biased) perspective on these
multi-aspect datasets. This missing data could raise serious concerns such as biased estimations
on structural properties of the network and properties of information cascades in
social networks. In order to recover missing values or information in social systems, we
identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption
of rich side information that is able to describe the similarities of entities, generation of
robust models rather than limiting them on specific applications, and scalability of models
to handle real large-scale datasets (billions of observed entries). With these challenges
in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks,
algorithms and methods for recovering missing information on social media. In
particular, this dissertation research makes four unique contributions:
_ The first research contribution of this dissertation research is to propose a scalable
framework based on low-rank tensor learning in the presence of incomplete information.
Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based
factorization approach based on the alternative direction method of multipliers
(ADMM) with the integration of the latent relationships derived from contextual
information among locations, memes, and times.
_ The second research contribution of this dissertation research is to evaluate the generalization
of the proposed tensor learning framework and extend it to the recommendation
problem. In particular, we develop a novel tensor-based approach to
solve the personalized expert recommendation by integrating both the latent relationships
between homogeneous entities (e.g., users and users, experts and experts)
and the relationships between heterogeneous entities (e.g., users and experts, topics
and experts) from the geo-spatial, topical, and social contexts.
_ The third research contribution of this dissertation research is to extend the proposed
tensor learning framework to the user topical profiling problem. Specifically,
we propose a tensor-based contextual regularization model embedded into a matrix
factorization framework, which leverages the social, textual, and behavioral contexts
across users, in order to overcome identified challenges.
_ The fourth research contribution of this dissertation research is to scale up the proposed
tensor learning framework to be capable of handling real large-scale datasets
that are too big to fit in the main memory of a single machine. Particularly, we
propose a novel distributed tensor completion algorithm with the trace-based regularization
of the auxiliary information based on ADMM under the proposed tensor
learning framework, which is designed to scale up to real large-scale tensors (e.g.,
billions of entries) by efficiently computing auxiliary variables, minimizing intermediate
data, and reducing the workload of updating new tensors
- …