6,586 research outputs found
Integrating Document Clustering and Topic Modeling
Document clustering and topic modeling are two closely related tasks which
can mutually benefit each other. Topic modeling can project documents into a
topic space which facilitates effective document clustering. Cluster labels
discovered by document clustering can be incorporated into topic models to
extract local topics specific to each cluster and global topics shared by all
clusters. In this paper, we propose a multi-grain clustering topic model
(MGCTM) which integrates document clustering and topic modeling into a unified
framework and jointly performs the two tasks to achieve the overall best
performance. Our model tightly couples two components: a mixture component used
for discovering latent groups in document collection and a topic model
component used for mining multi-grain topics including local topics specific to
each cluster and global topics shared across clusters.We employ variational
inference to approximate the posterior of hidden variables and learn model
parameters. Experiments on two datasets demonstrate the effectiveness of our
model.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Benchmarking in cluster analysis: A white paper
To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made
Unsupervised Human Activity Recognition Using the Clustering Approach: A Review
Currently, many applications have emerged from the implementation of softwaredevelopment and hardware use, known as the Internet of things. One of the most importantapplication areas of this type of technology is in health care. Various applications arise daily inorder to improve the quality of life and to promote an improvement in the treatments of patients athome that suffer from different pathologies. That is why there has emerged a line of work of greatinterest, focused on the study and analysis of daily life activities, on the use of different data analysistechniques to identify and to help manage this type of patient. This article shows the result of thesystematic review of the literature on the use of the Clustering method, which is one of the mostused techniques in the analysis of unsupervised data applied to activities of daily living, as well asthe description of variables of high importance as a year of publication, type of article, most usedalgorithms, types of dataset used, and metrics implemented. These data will allow the reader tolocate the recent results of the application of this technique to a particular area of knowledg
Scientometric Analysis of Optimisation and Machine Learning Publications
Introduction: Optimisation is an important aspect of machine learning because it helps improve accuracy and reduce errors in the model's predictions.
Purpose: The purpose of this research is to identify the global structure of optimization and machine learning. The work specifically looks at the collaborative network of countries in these fields, the top 20 authors in terms of production from 2015–2021, and the co-citation network of articles.
Methodology: In this study, co-word analysis and social network analysis were used to conduct a descriptive study based on the scientometric approach and the content analysis method. In this research, around 17,500 articles on optimization and machine learning published between 2015 and 2021 were extracted. An ANOVA was performed to evaluate whether there was a significant difference between betweenness, closeness, and pagerank. The Dimensions database was utilised for the investigation without language constraints. Moreover, Bibliometrix was used for calculation and visualization.
Findings: The results revealed a substantial difference between betweenness, proximity, and pagerank, indicating that this research has the potential to bring vital insights into future optimization and machine learning research
Replication issues in syntax-based aspect extraction for opinion mining
Reproducing experiments is an important instrument to validate previous work
and build upon existing approaches. It has been tackled numerous times in
different areas of science. In this paper, we introduce an empirical
replicability study of three well-known algorithms for syntactic centric
aspect-based opinion mining. We show that reproducing results continues to be a
difficult endeavor, mainly due to the lack of details regarding preprocessing
and parameter setting, as well as due to the absence of available
implementations that clarify these details. We consider these are important
threats to validity of the research on the field, specifically when compared to
other problems in NLP where public datasets and code availability are critical
validity components. We conclude by encouraging code-based research, which we
think has a key role in helping researchers to understand the meaning of the
state-of-the-art better and to generate continuous advances.Comment: Accepted in the EACL 2017 SR
- …