6,586 research outputs found

    Integrating Document Clustering and Topic Modeling

    Full text link
    Document clustering and topic modeling are two closely related tasks which can mutually benefit each other. Topic modeling can project documents into a topic space which facilitates effective document clustering. Cluster labels discovered by document clustering can be incorporated into topic models to extract local topics specific to each cluster and global topics shared by all clusters. In this paper, we propose a multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance. Our model tightly couples two components: a mixture component used for discovering latent groups in document collection and a topic model component used for mining multi-grain topics including local topics specific to each cluster and global topics shared across clusters.We employ variational inference to approximate the posterior of hidden variables and learn model parameters. Experiments on two datasets demonstrate the effectiveness of our model.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    Benchmarking in cluster analysis: A white paper

    Get PDF
    To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

    Semi-Supervised Time Point Clustering for Multivariate Time Series

    Get PDF

    Semi-Supervised Time Point Clustering for Multivariate Time Series

    Get PDF

    Unsupervised Human Activity Recognition Using the Clustering Approach: A Review

    Get PDF
    Currently, many applications have emerged from the implementation of softwaredevelopment and hardware use, known as the Internet of things. One of the most importantapplication areas of this type of technology is in health care. Various applications arise daily inorder to improve the quality of life and to promote an improvement in the treatments of patients athome that suffer from different pathologies. That is why there has emerged a line of work of greatinterest, focused on the study and analysis of daily life activities, on the use of different data analysistechniques to identify and to help manage this type of patient. This article shows the result of thesystematic review of the literature on the use of the Clustering method, which is one of the mostused techniques in the analysis of unsupervised data applied to activities of daily living, as well asthe description of variables of high importance as a year of publication, type of article, most usedalgorithms, types of dataset used, and metrics implemented. These data will allow the reader tolocate the recent results of the application of this technique to a particular area of knowledg

    Scientometric Analysis of Optimisation and Machine Learning Publications

    Get PDF
    Introduction: Optimisation is an important aspect of machine learning because it helps improve accuracy and reduce errors in the model's predictions. Purpose: The purpose of this research is to identify the global structure of optimization and machine learning. The work specifically looks at the collaborative network of countries in these fields, the top 20 authors in terms of production from 2015–2021, and the co-citation network of articles. Methodology: In this study, co-word analysis and social network analysis were used to conduct a descriptive study based on the scientometric approach and the content analysis method. In this research, around 17,500 articles on optimization and machine learning published between 2015 and 2021 were extracted. An ANOVA was performed to evaluate whether there was a significant difference between betweenness, closeness, and pagerank. The Dimensions database was utilised for the investigation without language constraints. Moreover, Bibliometrix was used for calculation and visualization. Findings: The results revealed a substantial difference between betweenness, proximity, and pagerank, indicating that this research has the potential to bring vital insights into future optimization and machine learning research

    Replication issues in syntax-based aspect extraction for opinion mining

    Full text link
    Reproducing experiments is an important instrument to validate previous work and build upon existing approaches. It has been tackled numerous times in different areas of science. In this paper, we introduce an empirical replicability study of three well-known algorithms for syntactic centric aspect-based opinion mining. We show that reproducing results continues to be a difficult endeavor, mainly due to the lack of details regarding preprocessing and parameter setting, as well as due to the absence of available implementations that clarify these details. We consider these are important threats to validity of the research on the field, specifically when compared to other problems in NLP where public datasets and code availability are critical validity components. We conclude by encouraging code-based research, which we think has a key role in helping researchers to understand the meaning of the state-of-the-art better and to generate continuous advances.Comment: Accepted in the EACL 2017 SR
    • …
    corecore