Search CORE

127,313 research outputs found

Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations

Author: Finn Chelsea
Yang Xinyu
Yao Huaxiu
Zhou Allan
Publication venue
Publication date: 06/10/2023
Field of study

There is an inescapable long-tailed class-imbalance issue in many real-world classification problems. Current methods for addressing this problem only consider scenarios where all examples come from the same distribution. However, in many cases, there are multiple domains with distinct class imbalance. We study this multi-domain long-tailed learning problem and aim to produce a model that generalizes well across all classes and domains. Towards that goal, we introduce TALLY, a method that addresses this multi-domain long-tailed learning problem. Built upon a proposed selective balanced sampling strategy, TALLY achieves this by mixing the semantic representation of one example with the domain-associated nuisances of another, producing a new representation for use as data augmentation. To improve the disentanglement of semantic representations, TALLY further utilizes a domain-invariant class prototype that averages out domain-specific effects. We evaluate TALLY on several benchmarks and real-world datasets and find that it consistently outperforms other state-of-the-art methods in both subpopulation and domain shift. Our code and data have been released at https://github.com/huaxiuyao/TALLY.Comment: Accepted by TML

arXiv.org e-Print Archive

Selective sampling for combined learning from labelled and unlabelled data

Author: Gabrys Bogdan
Petrakieva L.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2004
Field of study

This paper examines the problem of selecting a suitable subset of data to be labelled when building pattern classifiers from labelled and unlabelled data. The selection of representative set is guided by a clustering information and various options of allocating a number of samples within clusters and their distributions are investigated. The experimental results show that hybrid methods like Semi-supervised clustering with selective sampling can result in building a classifier which requires much less labelled data in order to achieve a comparable classification performance to classifiers built only on the basis of labelled data

Bournemouth University Research Online

Selective Sampling for Example-based Word Sense Disambiguation

Author: Fujii Atsushi
Inui Kentaro
Tanaka Hozumi
Tokunaga Takenobu
Publication venue
Publication date: 01/01/1998
Field of study

This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search). To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation. Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system. The system progressively collects examples by selecting those with greatest utility. The paper reports the effectiveness of our method through experiments on about one thousand sentences. Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system.Comment: 25 pages, 14 Postscript figure

arXiv.org e-Print Archive

CiteSeerX

Simulation techniques for estimating error in the classification of normal patterns

Author: Landgrebe D. A.
Whitsitt S. J.
Publication venue
Publication date
Field of study

Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound

NASA Technical Reports Server