Search CORE

6,586 research outputs found

Integrating Document Clustering and Topic Modeling

Author: Xie Pengtao
Xing Eric P.
Publication venue
Publication date: 26/09/2013
Field of study

Document clustering and topic modeling are two closely related tasks which can mutually benefit each other. Topic modeling can project documents into a topic space which facilitates effective document clustering. Cluster labels discovered by document clustering can be incorporated into topic models to extract local topics specific to each cluster and global topics shared by all clusters. In this paper, we propose a multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance. Our model tightly couples two components: a mixture component used for discovering latent groups in document collection and a topic model component used for mining multi-grain topics including local topics specific to each cluster and global topics shared across clusters.We employ variational inference to approximate the posterior of hidden variables and learn model parameters. Experiments on two datasets demonstrate the effectiveness of our model.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

arXiv.org e-Print Archive

CiteSeerX

Benchmarking in cluster analysis: A white paper

Author: Boulesteix Anne-Laure
Dangl Rainer
Dean Nema
Guyon Isabelle
Hennig Christian
Leisch Friedrich
Steinley Douglas
Van Mechelen Iven
Publication venue
Publication date: 01/10/2018
Field of study

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

arXiv.org e-Print Archive

Proceedings - University of Groningen

ARTS repository - University of Groningen

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Enlighten

Dissertations of the University of Groningen

Semi-Supervised Time Point Clustering for Multivariate Time Series

Author: Ertl Benjamin
Meyer Jörg
Schneider Matthias
Streit Achim
Publication venue: Canadian Artificial Intelligence Association
Publication date: 14/06/2021
Field of study

KITopen

Semi-Supervised Time Point Clustering for Multivariate Time Series

Author: Ertl Benjamin
Meyer Jörg
Schneider Matthias
Streit Achim
Publication venue: Canadian Artificial Intelligence Association
Publication date: 14/06/2021
Field of study

KITopen

NERD: Evaluating Named Entity Recognition Tools in the Web of Data

Author: Rizzo G. Troncy R.
Publication venue
Publication date: 01/01/2011
Field of study

EURECOM Repository

PORTO Publications Open Repository TOrino

Unsupervised Human Activity Recognition Using the Clustering Approach: A Review

Author: Ariza Colpas Paola Patricia
De-La-Hoz-Franco Emiro
Oviedo Carrascal Ana Isabel
PATARA FULVIO
Pineres-Melo Marlon
VICARIO ENRICO
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Currently, many applications have emerged from the implementation of softwaredevelopment and hardware use, known as the Internet of things. One of the most importantapplication areas of this type of technology is in health care. Various applications arise daily inorder to improve the quality of life and to promote an improvement in the treatments of patients athome that suffer from different pathologies. That is why there has emerged a line of work of greatinterest, focused on the study and analysis of daily life activities, on the use of different data analysistechniques to identify and to help manage this type of patient. This article shows the result of thesystematic review of the literature on the use of the Clustering method, which is one of the mostused techniques in the analysis of unsupervised data applied to activities of daily living, as well asthe description of variables of high importance as a year of publication, type of article, most usedalgorithms, types of dataset used, and metrics implemented. These data will allow the reader tolocate the recent results of the application of this technique to a particular area of knowledg

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Digital CUC

Scientometric Analysis of Optimisation and Machine Learning Publications

Author: David Opeoluwa Oyewola
E. E Daniel
Emmanuel Gbenga Dada
K. A. Al-Mustapha
Rowland Ogunrinde
Publication venue: Covenant University, Ota, Nigeria
Publication date: 16/12/2022
Field of study

Introduction: Optimisation is an important aspect of machine learning because it helps improve accuracy and reduce errors in the model's predictions. Purpose: The purpose of this research is to identify the global structure of optimization and machine learning. The work specifically looks at the collaborative network of countries in these fields, the top 20 authors in terms of production from 2015–2021, and the co-citation network of articles. Methodology: In this study, co-word analysis and social network analysis were used to conduct a descriptive study based on the scientometric approach and the content analysis method. In this research, around 17,500 articles on optimization and machine learning published between 2015 and 2021 were extracted. An ANOVA was performed to evaluate whether there was a significant difference between betweenness, closeness, and pagerank. The Dimensions database was utilised for the investigation without language constraints. Moreover, Bibliometrix was used for calculation and visualization. Findings: The results revealed a substantial difference between betweenness, proximity, and pagerank, indicating that this research has the potential to bring vital insights into future optimization and machine learning research

Covenant Journals (Covenant University)

Replication issues in syntax-based aspect extraction for opinion mining

Author: Marrese-Taylor Edison
Matsuo Yutaka
Publication venue
Publication date: 01/01/2017
Field of study

Reproducing experiments is an important instrument to validate previous work and build upon existing approaches. It has been tackled numerous times in different areas of science. In this paper, we introduce an empirical replicability study of three well-known algorithms for syntactic centric aspect-based opinion mining. We show that reproducing results continues to be a difficult endeavor, mainly due to the lack of details regarding preprocessing and parameter setting, as well as due to the absence of available implementations that clarify these details. We consider these are important threats to validity of the research on the field, specifically when compared to other problems in NLP where public datasets and code availability are critical validity components. We conclude by encouraging code-based research, which we think has a key role in helping researchers to understand the meaning of the state-of-the-art better and to generate continuous advances.Comment: Accepted in the EACL 2017 SR

arXiv.org e-Print Archive

Crossref