11,687 research outputs found
Light-weight Deep Extreme Multilabel Classification
Extreme multi-label (XML) classification refers to the task of supervised
multi-label learning that involves a large number of labels. Hence, scalability
of the classifier with increasing label dimension is an important
consideration. In this paper, we develop a method called LightDXML which
modifies the recently developed deep learning based XML framework by using
label embeddings instead of feature embedding for negative sampling and
iterating cyclically through three major phases: (1) proxy training of label
embeddings (2) shortlisting of labels for negative sampling and (3) final
classifier training using the negative samples. Consequently, LightDXML also
removes the requirement of a re-ranker module, thereby, leading to further
savings on time and memory requirements. The proposed method achieves the best
of both worlds: while the training time, model size and prediction times are on
par or better compared to the tree-based methods, it attains much better
prediction accuracy that is on par with the deep learning based methods.
Moreover, the proposed approach achieves the best tail-label prediction
accuracy over most state-of-the-art XML methods on some of the large
datasets\footnote{accepted in IJCNN 2023, partial funding from MAPG grant and
IIIT Seed grant at IIIT, Hyderabad, India. Code:
\url{https://github.com/misterpawan/LightDXML}Comment: 9 pages, 2 figures, 5 table
STNet: Selective Tuning of Convolutional Networks for Object Localization
Visual attention modeling has recently gained momentum in developing visual
hierarchies provided by Convolutional Neural Networks. Despite recent successes
of feedforward processing on the abstraction of concepts form raw images, the
inherent nature of feedback processing has remained computationally
controversial. Inspired by the computational models of covert visual attention,
we propose the Selective Tuning of Convolutional Networks (STNet). It is
composed of both streams of Bottom-Up and Top-Down information processing to
selectively tune the visual representation of Convolutional networks. We
experimentally evaluate the performance of STNet for the weakly-supervised
localization task on the ImageNet benchmark dataset. We demonstrate that STNet
not only successfully surpasses the state-of-the-art results but also generates
attention-driven class hypothesis maps
- …