2 research outputs found

    A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics

    Full text link
    Topic discovery has witnessed a significant growth as a field of data mining at large. In particular, time-evolving topic discovery, where the evolution of a topic is taken into account has been instrumental in understanding the historical context of an emerging topic in a dynamic corpus. Traditionally, time-evolving topic discovery has focused on this notion of time. However, especially in settings where content is contributed by a community or a crowd, an orthogonal notion of time is the one that pertains to the level of expertise of the content creator: the more experienced the creator, the more advanced the topic. In this paper, we propose a novel time-evolving topic discovery method which, in addition to the extracted topics, is able to identify the evolution of that topic over time, as well as the level of difficulty of that topic, as it is inferred by the level of expertise of its main contributors. Our method is based on a novel formulation of Constrained Coupled Matrix-Tensor Factorization, which adopts constraints well-motivated for, and, as we demonstrate, are essential for high-quality topic discovery. We qualitatively evaluate our approach using real data from the Physics and also Programming Stack Exchange forum, and we were able to identify topics of varying levels of difficulty which can be linked to external events, such as the announcement of gravitational waves by the LIGO lab in Physics forum. We provide a quantitative evaluation of our method by conducting a user study where experts were asked to judge the coherence and quality of the extracted topics. Finally, our proposed method has implications for automatic curriculum design using the extracted topics, where the notion of the level of difficulty is necessary for the proper modeling of prerequisites and advanced concepts

    MTC: Multiresolution Tensor Completion from Partial and Coarse Observations

    Full text link
    Existing tensor completion formulation mostly relies on partial observations from a single tensor. However, tensors extracted from real-world data are often more complex due to: (i) Partial observation: Only a small subset (e.g., 5%) of tensor elements are available. (ii) Coarse observation: Some tensor modes only present coarse and aggregated patterns (e.g., monthly summary instead of daily reports). In this paper, we are given a subset of the tensor and some aggregated/coarse observations (along one or more modes) and seek to recover the original fine-granular tensor with low-rank factorization. We formulate a coupled tensor completion problem and propose an efficient Multi-resolution Tensor Completion model (MTC) to solve the problem. Our MTC model explores tensor mode properties and leverages the hierarchy of resolutions to recursively initialize an optimization setup, and optimizes on the coupled system using alternating least squares. MTC ensures low computational and space complexity. We evaluate our model on two COVID-19 related spatio-temporal tensors. The experiments show that MTC could provide 65.20% and 75.79% percentage of fitness (PoF) in tensor completion with only 5% fine granular observations, which is 27.96% relative improvement over the best baseline. To evaluate the learned low-rank factors, we also design a tensor prediction task for daily and cumulative disease case predictions, where MTC achieves 50% in PoF and 30% relative improvements over the best baseline.Comment: Accepted in SIGKDD 2021. Code in https://github.com/ycq091044/MT
    corecore