Search CORE

993 research outputs found

Modeling text with generalizable Gaussian mixtures

Author: Hansen Lars Kai
Kjems Ulrik
Kolenda Thomas
Larsen Jan
Nielsen Finn Årup
Sigurdsson Sigurdur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

We apply and discuss generalizable Gaussian mixture (GGM) models for textmining. The model automatically adapts model complexity for a given text representation. We show that the generalizability of these models depends on the dimensionality of the representation and the sample size. We discuss the relation between supervised and unsupervised learning in text data. Finally, we implement a novelty detector based on the density model. 1. INTRODUCTION Information retrieval is a very active research field which is starting to adapt advanced machine learning techniques for solving hard real world problems [17, 18]. Textmining or pattern recognition in text data is used to categorize text according to topic, to spot new topics, and in a broader sense to create more intelligent searches, e.g., by WWW search engines [12, ?, 14]. Textmining proceeds by pattern recognition based on text features, typically document summary statistics. While there are numerous highlevel language models for extr..

CiteSeerX

Online Research Database In Technology

Hierarchical Clustering for Datamining

Author: Hansen Lars Kai
Have Anna Szynkowiak
Larsen Jan
Publication venue
Publication date: 01/01/2001
Field of study

Online Research Database In Technology

Probabilistic Hierarchical Clustering with Labeled and Unlabeled Data

Author: Hansen Lars Kai
Have Anna Szynkowiak
Larsen Jan
Publication venue
Publication date: 01/01/2001
Field of study

. This paper presents hierarchical probabilistic clustering methods for unsupervised and supervised learning in datamining applications, where supervised learning is performed using both labeled and unlabeled examples. The probabilistic clustering is based on the previously suggested Generalizable Gaussian Mixture model and is extended using a modified Expectation Maximization procedure for learning with both unlabeled and labeled examples. The proposed hierarchical scheme is agglomerative and based on probabilistic similarity measures. Here, we compare a L 2 dissimilarity measure, error confusion similarity, and accumulated posterior cluster probability measure. The unsupervised and supervised schemes are successfully tested on artificially data and for e-mails segmentation.

CiteSeerX

Online Research Database In Technology

Cognitive Component Analysis

Author: Feng Ling
Publication venue
Publication date: 01/11/2008
Field of study

Online Research Database In Technology

Generative Adversarial Positive-Unlabelled Learning

Author: Chaib-draa Brahim
Hou Ming
Li Chao
Zhao Qibin
Publication venue
Publication date: 04/04/2018
Field of study

In this work, we consider the task of classifying binary positive-unlabeled (PU) data. The existing discriminative learning based PU models attempt to seek an optimal reweighting strategy for U data, so that a decent decision boundary can be found. However, given limited P data, the conventional PU models tend to suffer from overfitting when adapted to very flexible deep neural networks. In contrast, we are the first to innovate a totally new paradigm to attack the binary PU task, from perspective of generative learning by leveraging the powerful generative adversarial networks (GAN). Our generative positive-unlabeled (GenPU) framework incorporates an array of discriminators and generators that are endowed with different roles in simultaneously producing positive and negative realistic samples. We provide theoretical analysis to justify that, at equilibrium, GenPU is capable of recovering both positive and negative data distributions. Moreover, we show GenPU is generalizable and closely related to the semi-supervised classification. Given rather limited P data, experiments on both synthetic and real-world dataset demonstrate the effectiveness of our proposed framework. With infinite realistic and diverse sample streams generated from GenPU, a very flexible classifier can then be trained using deep neural networks.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Knowledge Resources in Automatic Speech Recognition and Understanding for Romanian Language

Author: Corneliu Octavian Dumitru
Diana Mihaela Militaru
Inge Gavat
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Phonemes as short time cognitive components

Author: Feng Ling
Hansen Lars Kai
Publication venue
Publication date: 01/01/2006
Field of study

Online Research Database In Technology