Logical Creations Education and Research Institute
Doi
Abstract
Social medical being a predominant form of communication, millions of texts in terms of news articles, tweets, and snippets are generated worldwide every hour. From them discovering concise and useful knowledge has caught the interest from both academia and the business industry. Since the text document has an infinite amount of contextual information and it is sparse and ambiguous, therefore, learning topics automatically from them is a significant issue and challenge. To address this problem, this research paper proposes a non-negative matrix factorization (NMF) semantics-based model for extracting concise and meaningful topic titles for the text to grasp the whole text theme. The model is efficiently integrated with the semantic correlations between words and their context, which are learned through skip-gram. The NMF method is used to tackle this issue by using a block coordinate algorithm. In terms of topic coherence, extensive quantitative evaluations of the proposed models on a variety of real-world text datasets show that they outperform various state-of-the-art methods. The interpretability of these models demonstrated by qualitative semantic analysis, which identifies significant and consistent topics. It is an effective standard topic model for unstructured sparse text due to its superior performance and simple construction