1,311 research outputs found

    Analyze the Trend of Post Replies Based on Linear Regression Model-----take Tianyawebsiteas examples

    Get PDF
    In recent years, users spend more time on surfing the social networking than ever before. How to make the information spread rapidlywhen facing vast amounts of information?Scholars have conducted information dissemination in social networking. On the basis of previous research, the authors divide posts into popular posts and ordinary posts and then use the linear regression model to predict the replies at specified time. After comparing the difference between two types of posts, the authors concludethat ordinary posts could become popular posts if the posts could maintain a large number of replies within former five hours and increase replies by making use of community mechanism. This conclusion provides a reasonable proposal for enterprises and administrators to identify andrecommendpopular posts

    Analysing Longitudinal Social Science Questionnaires: Topic modelling with BERT-based Embeddings

    Get PDF
    Unsupervised topic modelling is a useful unbiased mechanism for topic labelling of complex longitudinal questionnaires covering multiple domains such as social science and medicine. Manual tagging of such complex datasets increases the propensity of incorrect or inconsistent labels and is a barrier to scaling the processing of longitudinal questionnaires for provision of question banks for data collection agencies. Towards this effort, we propose a tailored BERTopic framework that takes advantage of its novel sentence embedding for creating interpretable topics, and extend it with an enhanced visualisation for comparing the topic model labels with the tags manually assigned to the question literals. The resulting topic clusters uncover instances of mislabelled question tags, while also enabling showcasing the semantic shifts and evolution of the topics across the time span of the longitudinal questionnaires. The tailored BERTopic framework outperforms existing topic modelling baselines for the quantitative evaluation metrics of topic coherence and diversity, while also being 18 times faster than the next best-performing baseline

    A Momentum Theory for Hot Topic Life-cycle: A Case Study of Hot Hashtag Emerging in Twitter

    Get PDF
    The existing work on mining of hot topics is mainly based on topic multiplicity andattention from users in unit time. With the advent of social networking, the weight has been put on the hot topics which can effectively describe the importance and hotness of a topic. However, the researches on the influence exerted by the accumulation of attention towards hot topics and the alternation between hot topics and outdated ones are still relatively weak. In this paper, a novel algorithm for calculating the hotness of topics is proposed based on momentum. The number of the participants, but also the long tail effect of the historical accumulation on the topic is taken into consideration. Through this algorithm, we can accurately build a model for the hot topics on their emerging growing period and effectively describe the whole life circle of the topic. Additionally, the change between hot topics and old ones can be distinguished efficiently. Our experiments show that the process of a topic growing into a hot topic can be detected explicitly. Potential hot topics can be explored and the overdue ones can be rejected respectively

    Discovery Of Strain Support On Community Relatives In Social Networks

    Get PDF
    We offer a variety of algorithms to solve this new problem-solving process through three stages: pre-processing to find relevant topics, setting up sessions for multiple users, building all members STPs are the (expected) values ​​for individuals through the development of design, and selection in URSTPs Recipients of STPs. Critical and sensitive information, a detailed study is available. Supporting the assumptions is simply the standard measure for evaluating the consistency of a model, and it is understood that the amount or percentage of information involved in the design is in the underlying database. Acquired patterns are not particularly attractive for this purpose, as they are rare but very important for individuals to exhibit personal and negative behaviors that are complemented by reduced self-esteem. We propose a framework for solving this problem in practice, and designing appropriate algorithms to help. Initially, we provide first-hand treatment and evidence-based methods to cover the topic and plan the session. This method can be considered as a good match between the titles you purchased and endorsed by the STP and other topics that may have occurred in the purchases purchased by a particular class. The results suggest that our approach is able to capture and reveal the personal behavior of internet users in a transparent way

    Hot Topic Discovery in Online Community using Topic Labels and Hot Features

    Get PDF
    With huge volumes of information on Internet, how to extract user-concerned hot topics quickly and effectively has become a fundamental task for information processing on Internet. Generally, hot topic detection includes two tasks, the first one is topic discovery and the other is its hotness evaluation. In this paper, we propose a hot topic detection method. For topic discovery, topics are identified by clustering based on extracted topic labels. For hotness evaluation, the proposed model has fully considered the internal and external dual features and combined them together. The experimental results over TianYa BBS demonstrate the efficiency of the proposed method: compared with topic discovery based on latent semantic indexing, the improved vector space model based on topic labels gets better results and the identified topics are more accurate. Moreover, the proposed hotness features could reflect the popularity of a topic, and hence have obtained better hot topic results finally

    Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

    Full text link
    © 2018 All rights reserved. Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities

    No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

    Full text link
    Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works

    INTRODUCTION CONSUMER RESPONSIVE RARE CHRONOLOGICAL TOPIC PATTERNS IN FILE STREAMS

    Get PDF
    We offer several algorithms to solve this innovative mining problem in three stages: pre-processing to extract probabilistic issues and identify sessions for multiple users, generate all STP candidates with support values ​​(expected) per user for pattern growth, and decide on URSTP by searching for rare analysis the user is sensitive in derived STPs. Little information is inevitable, extensive survey is available. Easily supporting the idea is the most common metric for evaluating sequence sequencing and is understood as the amount or proportion of the sequence of information contained within the target database. The acquired patterns are not always interesting for this purpose, because the rare but important patterns of individual and personal behaviors are reduced by reduced support. We recommend a framework to solve this problem pragmatically and design similar algorithms to help you. In the beginning, we offer pre-treatment procedures with the extraction of heuristic methods and the identification of sessions. This method can be considered as a sequential match between the purchased items identified by STP as well as the probabilistic problems that occur within the purchased documents that belong to a particular session. The results indicate that our approach can certainly capture personal behaviors of online users and express them in an understandable way

    CONFISCATION USER PERCEPTION OF SERIES PATTERNS IS RARE IN DOCUMENT STREAMS

    Get PDF
    We provide several algorithms to solve this innovative mining problem through three stages: processed to extract probabilistic issues and identify sessions for multiple users, generate all STP candidates with support values ​​(expected) for each user growth patterns, and decide on URSTP by searching for a rare user analysis Sensitive in derived STPs. Little information is inevitable, extensive survey is available. Easily support the idea of ​​the most popular scale to evaluate sequential pattern pattern, defined as the quantity or sequence ratio containing the pattern information in the target database. Patterns acquired are not always interesting for this purpose to be reduced rare but meaningful patterns representing custom and abnormal individual behaviors due to low support. We advised a framework for solving this issue in a practical way and designing algorithms to assist in the interview. In the beginning, we offer pre-treatment procedures with the extraction of heuristic methods and the identification of sessions. This identity method can be considered a sequence between the items purchased and selected by STP and the probabilistic issues that occur within the purchased documents related to a particular cycle. The results indicate that our approach can certainly capture personal behaviors of online users and express them in an understandable way
    corecore