250,650 research outputs found
Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems
A growing number of applications, e.g. video surveillance and medical image
analysis, require training recognition systems from large amounts of weakly
annotated data while some targeted interactions with a domain expert are
allowed to improve the training process. In such cases, active learning (AL)
can reduce labeling costs for training a classifier by querying the expert to
provide the labels of most informative instances. This paper focuses on AL
methods for instance classification problems in multiple instance learning
(MIL), where data is arranged into sets, called bags, that are weakly labeled.
Most AL methods focus on single instance learning problems. These methods are
not suitable for MIL problems because they cannot account for the bag structure
of data. In this paper, new methods for bag-level aggregation of instance
informativeness are proposed for multiple instance active learning (MIAL). The
\textit{aggregated informativeness} method identifies the most informative
instances based on classifier uncertainty, and queries bags incorporating the
most information. The other proposed method, called \textit{cluster-based
aggregative sampling}, clusters data hierarchically in the instance space. The
informativeness of instances is assessed by considering bag labels, inferred
instance labels, and the proportion of labels that remain to be discovered in
clusters. Both proposed methods significantly outperform reference methods in
extensive experiments using benchmark data from several application domains.
Results indicate that using an appropriate strategy to address MIAL problems
yields a significant reduction in the number of queries needed to achieve the
same level of performance as single instance AL methods
Recommended from our members
DATA-DRIVEN APPROACH TO IMAGE CLASSIFICATION
Image classification has been a core topic in the computer vision community. Its recent success with convolutional neural network (CNN) algorithm has led to various real world applications such as large scale management of photos/videos on cloud/social-media, image based search for online retailers, self-driving cars, building robots and healthcare. Image classification can be broadly categorized into binary, multi-class and multi-label classification problems. Binary classification involves assigning one of the two class labels to an instance. In multi-class classification problem, an instance should be categorized into one of more than two classes. Multi-label classification is a generalized version of the multi-class classification problem where each image is assigned multiple labels as opposed to a single label.
In this work, we first present various methods that take advantage of deep representations (fully connected layer of pre-trained CNN on the ImageNet dataset) and yield better performance on multi-label classification when compared to methods that use over a dozen conventional visual features. Following the success of deep representations, we intend to build a generic end-to-end deep learning framework to address all three problem categories of image classification. However, there are still no well established guidelines (in terms of choosing the number of layers to go deeper, the number of kernels and the size, the type of regularizer, the choice of non-linear function, etc.) to build an efficient deep neural network and often network architecture design is specific to a problem/dataset. Hence, we present some initial efforts in building a computational framework called Deep Decision Network (DDN) which is completely data-driven. DDN is a tree-like structured built stage-wise. During the learning phase, starting from the root network node, DDN automatically builds a network that splits the data into disjoint clusters of classes which would be handled by the subsequent expert networks. This results in a tree-like structured network driven by the data. The proposed approach provides an insight into the data by identifying the group of classes that are hard to classify and require more attention when compared to others. This feature is crucial for people trying to solve the problem with little or no domain knowledge, especially for applications in medical domain. Initially, we evaluate DDN on a binary classification problem and later extend it to more challenging multi-class and multi-label classification problems. The extension of DDN to multi-class and multi-label involves some changes but they still operate under the same underlying principle. In all the three cases, the proposed approach is tested for its recognition performance and scalability on publicly available datasets providing comparison to other methods
Food Image Classification and Segmentation with Attention-based Multiple Instance Learning
The demand for accurate food quantification has increased in the recent
years, driven by the needs of applications in dietary monitoring. At the same
time, computer vision approaches have exhibited great potential in automating
tasks within the food domain. Traditionally, the development of machine
learning models for these problems relies on training data sets with
pixel-level class annotations. However, this approach introduces challenges
arising from data collection and ground truth generation that quickly become
costly and error-prone since they must be performed in multiple settings and
for thousands of classes. To overcome these challenges, the paper presents a
weakly supervised methodology for training food image classification and
semantic segmentation models without relying on pixel-level annotations. The
proposed methodology is based on a multiple instance learning approach in
combination with an attention-based mechanism. At test time, the models are
used for classification and, concurrently, the attention mechanism generates
semantic heat maps which are used for food class segmentation. In the paper, we
conduct experiments on two meta-classes within the FoodSeg103 data set to
verify the feasibility of the proposed approach and we explore the functioning
properties of the attention mechanism.Comment: Accepted for presentation at 18th International Workshop on Semantic
and Social Media Adaptation & Personalization (SMAP 2023
Lung nodules identification in CT scans using multiple instance learning.
Computer Aided Diagnosis (CAD) systems for lung nodules diagnosis aim to classify nodules into benign or malignant based on images obtained from diverse imaging modalities such as Computer Tomography (CT). Automated CAD systems are important in medical domain applications as they assist radiologists in the time-consuming and labor-intensive diagnosis process. However, most available methods require a large collection of nodules that are segmented and annotated by radiologists. This process is labor-intensive and hard to scale to very large datasets. More recently, some CAD systems that are based on deep learning have emerged. These algorithms do not require the nodules to be segmented, and radiologists need to only provide the center of mass of each nodule. The training image patches are then extracted from volumes of fixed-sized centered at the provided nodule\u27s center. However, since the size of nodules can vary significantly, one fixed size volume may not represent all nodules effectively. This thesis proposes a Multiple Instance Learning (MIL) approach to address the above limitations. In MIL, each nodule is represented by a nested sequence of volumes centered at the identified center of the nodule. We extract one feature vector from each volume. The set of features for each nodule are combined and represented by a bag. Next, we investigate and adapt some existing algorithms and develop new ones for this application. We start by applying benchmark MIL algorithms to traditional Gray Level Co-occurrence Matrix (GLCM) engineered features. Then, we design and train simple Convolutional Neural Networks (CNNs) to learn and extract features that characterize lung nodules. These extracted features are then fed to a benchmark MIL algorithm to learn a classification model. Finally, we develop new algorithms (MIL-CNN) that combine feature learning and multiple instance classification in a single network. These algorithms generalize the CNN architecture to multiple instance data. We design and report the results of three experiments applied on both generative (GLCM) and learned (CNN) features using two datasets (The Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) \cite{armato2011lung} and the National Lung Screening Trial (NLST) \cite{national2011reduced}). Two of these experiments perform five-fold cross-validations on the same dataset (NLST or LIDC). The third experiment trains the algorithms on one collection (NLST dataset) and tests it on the other (LIDC dataset). We designed our experiments to compare the different features, compare MIL versus Single Instance Learning (SIL) where a single feature vector represents a nodule, and compare our proposed end-to-end MIL approaches to existing benchmark MIL methods. We demonstrate that our proposed MIL-CNN frameworks are more accurate for the lung nodules diagnosis task. We also show that MIL representation achieves better results than SIL applied on the ground truth region of each nodule
MMFL-Net: Multi-scale and Multi-granularity Feature Learning for Cross-domain Fashion Retrieval
Instance-level image retrieval in fashion is a challenging issue owing to its
increasing importance in real-scenario visual fashion search. Cross-domain
fashion retrieval aims to match the unconstrained customer images as queries
for photographs provided by retailers; however, it is a difficult task due to a
wide range of consumer-to-shop (C2S) domain discrepancies and also considering
that clothing image is vulnerable to various non-rigid deformations. To this
end, we propose a novel multi-scale and multi-granularity feature learning
network (MMFL-Net), which can jointly learn global-local aggregation feature
representations of clothing images in a unified framework, aiming to train a
cross-domain model for C2S fashion visual similarity. First, a new
semantic-spatial feature fusion part is designed to bridge the semantic-spatial
gap by applying top-down and bottom-up bidirectional multi-scale feature
fusion. Next, a multi-branch deep network architecture is introduced to capture
global salient, part-informed, and local detailed information, and extracting
robust and discrimination feature embedding by integrating the similarity
learning of coarse-to-fine embedding with the multiple granularities. Finally,
the improved trihard loss, center loss, and multi-task classification loss are
adopted for our MMFL-Net, which can jointly optimize intra-class and
inter-class distance and thus explicitly improve intra-class compactness and
inter-class discriminability between its visual representations for feature
learning. Furthermore, our proposed model also combines the multi-task
attribute recognition and classification module with multi-label semantic
attributes and product ID labels. Experimental results demonstrate that our
proposed MMFL-Net achieves significant improvement over the state-of-the-art
methods on the two datasets, DeepFashion-C2S and Street2Shop.Comment: 27 pages, 12 figures, Published by <Multimedia Tools and
Applications
- …