2 research outputs found
A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics
Topic discovery has witnessed a significant growth as a field of data mining
at large. In particular, time-evolving topic discovery, where the evolution of
a topic is taken into account has been instrumental in understanding the
historical context of an emerging topic in a dynamic corpus. Traditionally,
time-evolving topic discovery has focused on this notion of time. However,
especially in settings where content is contributed by a community or a crowd,
an orthogonal notion of time is the one that pertains to the level of expertise
of the content creator: the more experienced the creator, the more advanced the
topic. In this paper, we propose a novel time-evolving topic discovery method
which, in addition to the extracted topics, is able to identify the evolution
of that topic over time, as well as the level of difficulty of that topic, as
it is inferred by the level of expertise of its main contributors. Our method
is based on a novel formulation of Constrained Coupled Matrix-Tensor
Factorization, which adopts constraints well-motivated for, and, as we
demonstrate, are essential for high-quality topic discovery. We qualitatively
evaluate our approach using real data from the Physics and also Programming
Stack Exchange forum, and we were able to identify topics of varying levels of
difficulty which can be linked to external events, such as the announcement of
gravitational waves by the LIGO lab in Physics forum. We provide a quantitative
evaluation of our method by conducting a user study where experts were asked to
judge the coherence and quality of the extracted topics. Finally, our proposed
method has implications for automatic curriculum design using the extracted
topics, where the notion of the level of difficulty is necessary for the proper
modeling of prerequisites and advanced concepts
MTC: Multiresolution Tensor Completion from Partial and Coarse Observations
Existing tensor completion formulation mostly relies on partial observations
from a single tensor. However, tensors extracted from real-world data are often
more complex due to: (i) Partial observation: Only a small subset (e.g., 5%) of
tensor elements are available. (ii) Coarse observation: Some tensor modes only
present coarse and aggregated patterns (e.g., monthly summary instead of daily
reports). In this paper, we are given a subset of the tensor and some
aggregated/coarse observations (along one or more modes) and seek to recover
the original fine-granular tensor with low-rank factorization. We formulate a
coupled tensor completion problem and propose an efficient Multi-resolution
Tensor Completion model (MTC) to solve the problem. Our MTC model explores
tensor mode properties and leverages the hierarchy of resolutions to
recursively initialize an optimization setup, and optimizes on the coupled
system using alternating least squares. MTC ensures low computational and space
complexity. We evaluate our model on two COVID-19 related spatio-temporal
tensors. The experiments show that MTC could provide 65.20% and 75.79%
percentage of fitness (PoF) in tensor completion with only 5% fine granular
observations, which is 27.96% relative improvement over the best baseline. To
evaluate the learned low-rank factors, we also design a tensor prediction task
for daily and cumulative disease case predictions, where MTC achieves 50% in
PoF and 30% relative improvements over the best baseline.Comment: Accepted in SIGKDD 2021. Code in https://github.com/ycq091044/MT