8,516 research outputs found
Gibbs Max-margin Topic Models with Data Augmentation
Max-margin learning is a powerful approach to building classifiers and
structured output predictors. Recent work on max-margin supervised topic models
has successfully integrated it with Bayesian topic models to discover
discriminative latent semantic structures and make accurate predictions for
unseen testing data. However, the resulting learning problems are usually hard
to solve because of the non-smoothness of the margin loss. Existing approaches
to building max-margin supervised topic models rely on an iterative procedure
to solve multiple latent SVM subproblems with additional mean-field assumptions
on the desired posterior distributions. This paper presents an alternative
approach by defining a new max-margin loss. Namely, we present Gibbs max-margin
supervised topic models, a latent variable Gibbs classifier to discover hidden
topic representations for various tasks, including classification, regression
and multi-task learning. Gibbs max-margin supervised topic models minimize an
expected margin loss, which is an upper bound of the existing margin loss
derived from an expected prediction rule. By introducing augmented variables
and integrating out the Dirichlet variables analytically by conjugacy, we
develop simple Gibbs sampling algorithms with no restricting assumptions and no
need to solve SVM subproblems. Furthermore, each step of the
"augment-and-collapse" Gibbs sampling algorithms has an analytical conditional
distribution, from which samples can be easily drawn. Experimental results
demonstrate significant improvements on time efficiency. The classification
performance is also significantly improved over competitors on binary,
multi-class and multi-label classification tasks.Comment: 35 page
Automation of motor dexterity assessment
Motor dexterity assessment is regularly performed in rehabilitation wards to establish patient status and automatization for such routinary task is sought. A system for automatizing the assessment of motor dexterity based on the Fugl-Meyer scale and with loose restrictions on sensing technologies is presented. The system consists of two main elements: 1) A data representation that abstracts the low level information obtained from a variety of sensors, into a highly separable low dimensionality encoding employing t-distributed Stochastic Neighbourhood Embedding, and, 2) central to this communication, a multi-label classifier that boosts classification rates by exploiting the fact that the classes corresponding to the individual exercises are naturally organized as a network. Depending on the targeted therapeutic movement class labels i.e. exercises scores, are highly correlated-patients who perform well in one, tends to perform well in related exercises-; and critically no node can be used as proxy of others - an exercise does not encode the information of other exercises. Over data from a cohort of 20 patients, the novel classifier outperforms classical Naive Bayes, random forest and variants of support vector machines (ANOVA: p <; 0.001). The novel multi-label classification strategy fulfills an automatic system for motor dexterity assessment, with implications for lessening therapist's workloads, reducing healthcare costs and providing support for home-based virtual rehabilitation and telerehabilitation alternatives
Simpler is better: a novel genetic algorithm to induce compact multi-label chain classifiers
Multi-label classification (MLC) is the task of assigning multiple class labels to an object based on the features that describe the object. One of the most effective MLC methods is known as Classifier Chains (CC). This approach consists in training q binary classifiers linked in a chain, y1 → y2 → ... → yq, with each responsible for classifying a specific label in {l1, l2, ..., lq}. The chaining mechanism allows each individual classifier to incorporate the predictions of the previous ones as additional information at classification time. Thus, possible correlations among labels can be automatically exploited. Nevertheless, CC suffers from two important drawbacks: (i) the label ordering is decided at random, although it usually has a strong effect on predictive accuracy; (ii) all labels are inserted into the chain, although some of them might carry irrelevant information to discriminate the others. In this paper we tackle both problems at once, by proposing a novel genetic algorithm capable of searching for a single optimized label ordering, while at the same time taking into consideration the utilization of partial chains. Experiments on benchmark datasets demonstrate that our approach is able to produce models that are both simpler and more accurate
- …