182 research outputs found
Human-Centered Tools for Coping with Imperfect Algorithms during Medical Decision-Making
Machine learning (ML) is increasingly being used in image retrieval systems
for medical decision making. One application of ML is to retrieve visually
similar medical images from past patients (e.g. tissue from biopsies) to
reference when making a medical decision with a new patient. However, no
algorithm can perfectly capture an expert's ideal notion of similarity for
every case: an image that is algorithmically determined to be similar may not
be medically relevant to a doctor's specific diagnostic needs. In this paper,
we identified the needs of pathologists when searching for similar images
retrieved using a deep learning algorithm, and developed tools that empower
users to cope with the search algorithm on-the-fly, communicating what types of
similarity are most important at different moments in time. In two evaluations
with pathologists, we found that these refinement tools increased the
diagnostic utility of images found and increased user trust in the algorithm.
The tools were preferred over a traditional interface, without a loss in
diagnostic accuracy. We also observed that users adopted new strategies when
using refinement tools, re-purposing them to test and understand the underlying
algorithm and to disambiguate ML errors from their own errors. Taken together,
these findings inform future human-ML collaborative systems for expert
decision-making
Fuzzy shape Classification exploiting Geometrical and Moments Descriptors
In the era of data intensive management and discovery, the volume of images repositories requires effective means for mining and classifying digital image collections. Recent studies have evidenced great interest in image processing by "mining" visual information for objects recognition and retrieval. Particularly, image disambiguation based on the shape produces better results than traditional features such as color or texture. On the other hand, the classification of objects extracted from images appears more intuitively formulated as a shape classification task. This work introduces an approach for 2D shapes classification, based on the combined use of geometrical and moments features extracted by a given collection of images. It achieves a shape based classification exploiting fuzzy clustering techniques, which enable also a query-by-image
Semantic image retrieval using relevance feedback and transaction logs
Due to the recent improvements in digital photography and storage capacity, storing large amounts of images has been made possible, and efficient means to retrieve images matching a userâs query are needed. Content-based Image Retrieval (CBIR) systems automatically extract image contents based on image features, i.e. color, texture, and shape. Relevance feedback methods are applied to CBIR to integrate usersâ perceptions and reduce the gap between high-level image semantics and low-level image features. The precision of a CBIR system in retrieving semantically rich (complex) images is improved in this dissertation work by making advancements in three areas of a CBIR system: input, process, and output. The input of the system includes a mechanism that provides the user with required tools to build and modify her query through feedbacks. Users behavioral in CBIR environments are studied, and a new feedback methodology is presented to efficiently capture usersâ image perceptions. The process element includes image learning and retrieval algorithms. A Long-term image retrieval algorithm (LTL), which learns image semantics from prior search results available in the systemâs transaction history, is developed using Factor Analysis. Another algorithm, a short-term learner (STL) that captures userâs image perceptions based on image features and userâs feedbacks in the on-going transaction, is developed based on Linear Discriminant Analysis. Then, a mechanism is introduced to integrate these two algorithms to one retrieval procedure. Finally, a retrieval strategy that includes learning and searching phases is defined for arranging images in the output of the system. The developed relevance feedback methodology proved to reduce the effect of human subjectivity in providing feedbacks for complex images. Retrieval algorithms were applied to images with different degrees of complexity. LTL is efficient in extracting the semantics of complex images that have a history in the system. STL is suitable for query and images that can be effectively represented by their image features. Therefore, the performance of the system in retrieving images with visual and conceptual complexities was improved when both algorithms were applied simultaneously. Finally, the strategy of retrieval phases demonstrated promising results when the query complexity increases
Enhancing Automatic Annotation for Optimal Image Retrieval
Image search and retrieval based on content is very cumbersome task particularly when the image database is large. The accuracy of the retrieval as well as the processing speed are two important measures used for assessing and comparing the effectiveness of various systems.
Text retrieval is more mature and advanced than image content retrieval. In this dissertation, the focus is on converting image content into text tags that can be easily searched using standard search engines where the size and speed issues of the database have been already dealt with.
Therefore, image tagging becomes an essential tool for image retrieval from large image databases. Automation of image tagging has received considerable attention by many researchers in recent years. The optimal goal of image description is to automatically annotate images with tags that semantically represent the image content. The speed and accuracy of Image retrieval from large databases are few of the important domains that can benefit from automatic tagging.
In this work, several state of the art image classification and image tagging techniques are reviewed. We propose a new self-learning multilayered tagging framework that can address the limitations of current approaches and provide mutual accuracy improvement between the recognition layer and the annotation layer. Our results indicate that the proposed framework can improve the overall accuracy of information retrieval in a variety of image databases
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Training deep retrieval models with noisy datasets
In this thesis we study loss functions that allow to train Convolutional Neural
Networks (CNNs) under noisy datasets for the particular task of Content-
Based Image Retrieval (CBIR). In particular, we propose two novel losses to fit
models that generate global image representations. First, a Soft-Matching (SM)
loss, exploiting both image content and meta data, is used to specialized general
CNNs to particular cities or regions using weakly annotated datasets. Second,
a Bag Exponential (BE) loss inspired by the Multiple Instance Learning (MIL)
framework is employed to train CNNs for CBIR under noisy datasets.
The first part of the thesis introduces a novel training framework that, relying
on image content and meta data, learns location-adapted deep models that
provide fine-tuned image descriptors for specific visual contents. Our networks,
which start from a baseline model originally learned for a different task, are specialized
using a custom pairwise loss function, our proposed SM loss, that uses
weak labels based on image content and meta data.
The experimental results show that the proposed location-adapted CNNs
achieve an improvement of up to a 55% over the baseline networks on a landmark
discovery task. This implies that the models successfully learn the visual
clues and peculiarities of the region for which they are trained, and generate
image descriptors that are better location-adapted. In addition, for those landmarks
that are not present on the training set or even other cities, our proposed
models perform at least as well as the baseline network, which indicates a good
resilience against overfitting.
The second part of the thesis introduces the BE Loss function to train CNNs
for image retrieval borrowing inspiration from the MIL framework. The loss
combines the use of an exponential function acting as a soft margin, and a MILbased
mechanism working with bags of positive and negative pairs of images.
The method allows to train deep retrieval networks under noisy datasets, by
weighing the influence of the different samples at loss level, which increases the
performance of the generated global descriptors. The rationale behind the improvement
is that we are handling noise in an end-to-end manner and, therefore,
avoiding its negative influence as well as the unintentional biases due to fixed
pre-processing cleaning procedures. In addition, our method is general enough
to suit other scenarios requiring different weights for the training instances (e.g.
boosting the influence of hard positives during training). The proposed bag exponential
function can bee seen as a back door to guide the learning process
according to a certain objective in a end-to-end manner, allowing the model to
approach such an objective smoothly and progressively.
Our results show that our loss allows CNN-based retrieval systems to be
trained with noisy training sets and achieve state-of-the-art performance. Furthermore,
we have found that it is better to use training sets that are highly
correlated with the final task, even if they are noisy, than training with a clean set that is only weakly related with the topic at hand. From our point of view,
this result represents a big leap in the applicability of retrieval systems and help
to reduce the effort needed to set-up new CBIR applications: e.g. by allowing
a fast automatic generation of noisy training datasets and then using our bag
exponential loss to deal with noise. Moreover, we also consider that this result
opens a new line of research for CNN-based image retrieval: let the models decide
not only on the best features to solve the task but also on the most relevant
samples to do it.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Luis Salgado Ălvarez de Sotomayor.- Secretario: Pablos MartĂnez Olmos.- Vocal: Ernest Valveny Llobe
- âŠ