Search CORE

73,966 research outputs found

Statistical topic models for multi-label document classification

Author: A. C. P. L. F. Carvalho de
A. K. McCallum
America Chambers
D. Blei
D. D. Lewis
D. M. Blei
D. M. Blei
D. M. Blei
D. Mimno
D. Mimno
D. Ramage
E. L. Allwein
E. Loza Mencía
E. Loza Mencía
F. Sebastiani
G. Druck
G. Forman
G. Tsoumakas
G. Tsoumakas
J. Davis
J. Fürnkranz
J. Read
J. Zhu
K. Crammer
K.-M. Schneider
L. Cao
M. Ioannou
M. Rosen-Zvi
M.-L. Zhang
M.-L. Zhang
Mark Steyvers
N. Ghamrawi
N. Japkowicz
N. Ueda
O. Dekel
Padhraic Smyth
R. Rak
R. Rifkin
R.-E. Fan
S. Ji
S. Lacoste-Julien
T. L. Griffiths
T.-Y. Liu
Timothy N. Rubin
W. Hersh
Y. W. Teh
Y. Wang
Y. Yang
Y. Yang
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Gibbs Max-margin Topic Models with Data Augmentation

Author: Chen Ning
Perkins Hugh
Zhang Bo
Zhu Jun
Publication venue
Publication date: 10/10/2013
Field of study

Max-margin learning is a powerful approach to building classifiers and structured output predictors. Recent work on max-margin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the non-smoothness of the margin loss. Existing approaches to building max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new max-margin loss. Namely, we present Gibbs max-margin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multi-task learning. Gibbs max-margin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems. Furthermore, each step of the "augment-and-collapse" Gibbs sampling algorithms has an analytical conditional distribution, from which samples can be easily drawn. Experimental results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors on binary, multi-class and multi-label classification tasks.Comment: 35 page

arXiv.org e-Print Archive

CiteSeerX

Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models

Author: Laliotis Manos
Markantonatos Nikos
Papanikolaou Yannis
Tsoumakas Grigorios
Vlahavas Ioannis
Publication venue
Publication date: 18/04/2017
Field of study

Background: In this paper we present the approaches and methods employed in order to deal with a large scale multi-label semantic indexing task of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge of 2014. Methods: The main contribution of this work is a multi-label ensemble method that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms. Some secondary contributions include a study on the temporal aspects of the BioASQ corpus (observations apply also to the BioASQ's super-set, the PubMed articles collection) and the proper adaptation of the algorithms used to deal with this challenging classification task. Results: The ensemble method we developed is compared to other approaches in experimental scenarios with subsets of the BioASQ corpus giving positive results. During the BioASQ 2014 challenge we obtained the first place during the first batch and the third in the two following batches. Our success in the BioASQ challenge proved that a fully automated machine-learning approach, which does not implement any heuristics and rule-based approaches, can be highly competitive and outperform other approaches in similar challenging contexts

arXiv.org e-Print Archive

Directory of Open Access Journals