168,295 research outputs found
Towards Effective Multi-Label Recognition Attacks via Knowledge Graph Consistency
Many real-world applications of image recognition require multi-label
learning, whose goal is to find all labels in an image. Thus, robustness of
such systems to adversarial image perturbations is extremely important.
However, despite a large body of recent research on adversarial attacks, the
scope of the existing works is mainly limited to the multi-class setting, where
each image contains a single label. We show that the naive extensions of
multi-class attacks to the multi-label setting lead to violating label
relationships, modeled by a knowledge graph, and can be detected using a
consistency verification scheme. Therefore, we propose a graph-consistent
multi-label attack framework, which searches for small image perturbations that
lead to misclassifying a desired target set while respecting label hierarchies.
By extensive experiments on two datasets and using several multi-label
recognition models, we show that our method generates extremely successful
attacks that, unlike naive multi-label perturbations, can produce model
predictions consistent with the knowledge graph
Learning to recognize occluded and small objects with partial inputs
Recognizing multiple objects in an image is challenging due to occlusions,
and becomes even more so when the objects are small. While promising, existing
multi-label image recognition models do not explicitly learn context-based
representations, and hence struggle to correctly recognize small and occluded
objects. Intuitively, recognizing occluded objects requires knowledge of
partial input, and hence context. Motivated by this intuition, we propose
Masked Supervised Learning (MSL), a single-stage, model-agnostic learning
paradigm for multi-label image recognition. The key idea is to learn
context-based representations using a masked branch and to model label
co-occurrence using label consistency. Experimental results demonstrate the
simplicity, applicability and more importantly the competitive performance of
MSL against previous state-of-the-art methods on standard multi-label image
recognition benchmarks. In addition, we show that MSL is robust to random
masking and demonstrate its effectiveness in recognizing non-masked objects.
Code and pretrained models are available on GitHub
Visual Understanding via Multi-Feature Shared Learning with Global Consistency
Image/video data is usually represented with multiple visual features. Fusion
of multi-source information for establishing the attributes has been widely
recognized. Multi-feature visual recognition has recently received much
attention in multimedia applications. This paper studies visual understanding
via a newly proposed l_2-norm based multi-feature shared learning framework,
which can simultaneously learn a global label matrix and multiple
sub-classifiers with the labeled multi-feature data. Additionally, a group
graph manifold regularizer composed of the Laplacian and Hessian graph is
proposed for better preserving the manifold structure of each feature, such
that the label prediction power is much improved through the semi-supervised
learning with global label consistency. For convenience, we call the proposed
approach Global-Label-Consistent Classifier (GLCC). The merits of the proposed
method include: 1) the manifold structure information of each feature is
exploited in learning, resulting in a more faithful classification owing to the
global label consistency; 2) a group graph manifold regularizer based on the
Laplacian and Hessian regularization is constructed; 3) an efficient
alternative optimization method is introduced as a fast solver owing to the
convex sub-problems. Experiments on several benchmark visual datasets for
multimedia understanding, such as the 17-category Oxford Flower dataset, the
challenging 101-category Caltech dataset, the YouTube & Consumer Videos dataset
and the large-scale NUS-WIDE dataset, demonstrate that the proposed approach
compares favorably with the state-of-the-art algorithms. An extensive
experiment on the deep convolutional activation features also show the
effectiveness of the proposed approach. The code is available on
http://www.escience.cn/people/lei/index.htmlComment: 13 pages,6 figures, this paper is accepted for publication in IEEE
Transactions on Multimedi
Multi-label prediction method for lithology, lithofacies and fluid classes based on data augmentation by cascade forest
Predicting the lithology, lithofacies and reservoir fluid classes of igneous rocks holds significant value in the domains of CO2 storage and reservoir evaluation. However, no precedent exists for research on the multi-label identification of igneous rocks. This study proposes a multi-label data augmented cascade forest method for the prediction of multilabel lithology, lithofacies and fluid using 9 conventional logging data features of cores collected from the eastern depression of the Liaohe Basin in northeastern China. Data augmentation is performed on an unbalanced multi-label training set using the multi-label synthetic minority over-sampling technique. Sample training is achieved by a multi-label cascade forest consisting of predictive clustering trees. These cascade structures possess adaptive feature selection and layer growth mechanisms. Given the necessity to focus on all possible outcomes and the generalization ability of the method, a simulated well model is built and then compared with 6 typical multi-label learning methods. The outperformance of this method in the evaluation metrics validates its superiority in terms of accuracy and generalization ability. The consistency of the predicted results and geological data of actual wells verifies the reliability of our method. Furthermore, the results show that it can be used as a reliable means of multi-label prediction of igneous lithology, lithofacies and reservoir fluids.Document Type: Original articleCited as: Han, R., Wang, Z., Guo, Y., Wang, X., A, R., Zhong, G. Multi-label prediction method for lithology, lithofacies and fluid classes based on data augmentation by cascade forest. Advances in Geo-Energy Research, 2023, 9(1): 25-37. https://doi.org/10.46690/ager.2023.07.0
Surrogate Functions for Maximizing Precision at the Top
The problem of maximizing precision at the top of a ranked list, often dubbed
Precision@k (prec@k), finds relevance in myriad learning applications such as
ranking, multi-label classification, and learning with severe label imbalance.
However, despite its popularity, there exist significant gaps in our
understanding of this problem and its associated performance measure.
The most notable of these is the lack of a convex upper bounding surrogate
for prec@k. We also lack scalable perceptron and stochastic gradient descent
algorithms for optimizing this performance measure. In this paper we make key
contributions in these directions. At the heart of our results is a family of
truly upper bounding surrogates for prec@k. These surrogates are motivated in a
principled manner and enjoy attractive properties such as consistency to prec@k
under various natural margin/noise conditions.
These surrogates are then used to design a class of novel perceptron
algorithms for optimizing prec@k with provable mistake bounds. We also devise
scalable stochastic gradient descent style methods for this problem with
provable convergence bounds. Our proofs rely on novel uniform convergence
bounds which require an in-depth analysis of the structural properties of
prec@k and its surrogates. We conclude with experimental results comparing our
algorithms with state-of-the-art cutting plane and stochastic gradient
algorithms for maximizing [email protected]: To appear in the the proceedings of the 32nd International Conference
on Machine Learning (ICML 2015
Weakly-supervised Micro- and Macro-expression Spotting Based on Multi-level Consistency
Most micro- and macro-expression spotting methods in untrimmed videos suffer
from the burden of video-wise collection and frame-wise annotation.
Weakly-supervised expression spotting (WES) based on video-level labels can
potentially mitigate the complexity of frame-level annotation while achieving
fine-grained frame-level spotting. However, we argue that existing
weakly-supervised methods are based on multiple instance learning (MIL)
involving inter-modality, inter-sample, and inter-task gaps. The inter-sample
gap is primarily from the sample distribution and duration. Therefore, we
propose a novel and simple WES framework, MC-WES, using multi-consistency
collaborative mechanisms that include modal-level saliency, video-level
distribution, label-level duration and segment-level feature consistency
strategies to implement fine frame-level spotting with only video-level labels
to alleviate the above gaps and merge prior knowledge. The modal-level saliency
consistency strategy focuses on capturing key correlations between raw images
and optical flow. The video-level distribution consistency strategy utilizes
the difference of sparsity in temporal distribution. The label-level duration
consistency strategy exploits the difference in the duration of facial muscles.
The segment-level feature consistency strategy emphasizes that features under
the same labels maintain similarity. Experimental results on three challenging
datasets -- CAS(ME), CAS(ME), and SAMM-LV -- demonstrate that MC-WES is
comparable to state-of-the-art fully-supervised methods
- …