1,511 research outputs found

    F-measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets

    Full text link
    We discuss a method to improve the exact F-measure maximization algorithm called GFM, proposed in (Dembczynski et al. 2011) for multi-label classification, assuming the label set can be can partitioned into conditionally independent subsets given the input features. If the labels were all independent, the estimation of only mm parameters (mm denoting the number of labels) would suffice to derive Bayes-optimal predictions in O(m2)O(m^2) operations. In the general case, m2+1m^2+1 parameters are required by GFM, to solve the problem in O(m3)O(m^3) operations. In this work, we show that the number of parameters can be reduced further to m2/nm^2/n, in the best case, assuming the label set can be partitioned into nn conditionally independent subsets. As this label partition needs to be estimated from the data beforehand, we use first the procedure proposed in (Gasse et al. 2015) that finds such partition and then infer the required parameters locally in each label subset. The latter are aggregated and serve as input to GFM to form the Bayes-optimal prediction. We show on a synthetic experiment that the reduction in the number of parameters brings about significant benefits in terms of performance

    Information-Theoretic Active Learning for Content-Based Image Retrieval

    Full text link
    We propose Information-Theoretic Active Learning (ITAL), a novel batch-mode active learning method for binary classification, and apply it for acquiring meaningful user feedback in the context of content-based image retrieval. Instead of combining different heuristics such as uncertainty, diversity, or density, our method is based on maximizing the mutual information between the predicted relevance of the images and the expected user feedback regarding the selected batch. We propose suitable approximations to this computationally demanding problem and also integrate an explicit model of user behavior that accounts for possible incorrect labels and unnameable instances. Furthermore, our approach does not only take the structure of the data but also the expected model output change caused by the user feedback into account. In contrast to other methods, ITAL turns out to be highly flexible and provides state-of-the-art performance across various datasets, such as MIRFLICKR and ImageNet.Comment: GCPR 2018 paper (14 pages text + 2 pages references + 6 pages appendix
    corecore