160 research outputs found
Bridging the gap between Natural and Medical Images through Deep Colorization
Deep learning has thrived by training on large-scale datasets. However, in
many applications, as for medical image diagnosis, getting massive amount of
data is still prohibitive due to privacy, lack of acquisition homogeneity and
annotation cost. In this scenario, transfer learning from natural image
collections is a standard practice that attempts to tackle shape, texture and
color discrepancies all at once through pretrained model fine-tuning. In this
work, we propose to disentangle those challenges and design a dedicated network
module that focuses on color adaptation. We combine learning from scratch of
the color module with transfer learning of different classification backbones,
obtaining an end-to-end, easy-to-train architecture for diagnostic image
recognition on X-ray images. Extensive experiments showed how our approach is
particularly efficient in case of data scarcity and provides a new path for
further transferring the learned color information across multiple medical
datasets
Bridging the gap between Natural and Medical Images through Deep Colorization
Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of
data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario, transfer learning from natural image
collections is a standard practice that attempts to tackle shape, texture and color discrepancies all at once through pretrained model fine-tuning. In this
work, we propose to disentangle those challenges and design a dedicated network module that focuses on color adaptation. We combine learning from scratch of
the color module with transfer learning of different classification backbones, obtaining an end-to-end, easy-to-train architecture for diagnostic image
recognition on X-ray images. Extensive experiments showed how our approach is particularly efficient in case of data scarcity and provides a new path for
further transferring the learned color information across multiple medical datasets
Self-supervised learning methods and applications in medical imaging analysis: A survey
The scarcity of high-quality annotated medical imaging datasets is a major
problem that collides with machine learning applications in the field of
medical imaging analysis and impedes its advancement. Self-supervised learning
is a recent training paradigm that enables learning robust representations
without the need for human annotation which can be considered an effective
solution for the scarcity of annotated medical data. This article reviews the
state-of-the-art research directions in self-supervised learning approaches for
image data with a concentration on their applications in the field of medical
imaging analysis. The article covers a set of the most recent self-supervised
learning methods from the computer vision field as they are applicable to the
medical imaging analysis and categorize them as predictive, generative, and
contrastive approaches. Moreover, the article covers 40 of the most recent
research papers in the field of self-supervised learning in medical imaging
analysis aiming at shedding the light on the recent innovation in the field.
Finally, the article concludes with possible future research directions in the
field
IST Austria Thesis
Deep neural networks have established a new standard for data-dependent feature extraction pipelines in the Computer Vision literature. Despite their remarkable performance in the standard supervised learning scenario, i.e. when models are trained with labeled data and tested on samples that follow a similar distribution, neural networks have been shown to struggle with more advanced generalization abilities, such as transferring knowledge across visually different domains, or generalizing to new unseen combinations of known concepts. In this thesis we argue that, in contrast to the usual black-box behavior of neural networks, leveraging more structured internal representations is a promising direction
for tackling such problems. In particular, we focus on two forms of structure. First, we tackle modularity: We show that (i) compositional architectures are a natural tool for modeling reasoning tasks, in that they efficiently capture their combinatorial nature, which is key for generalizing beyond the compositions seen during training. We investigate how to to learn such models, both formally and experimentally, for the task of abstract visual reasoning. Then, we show that (ii) in some settings, modularity allows us to efficiently break down complex tasks into smaller, easier, modules, thereby improving computational efficiency; We study this behavior in the context of generative models for colorization, as well as for small objects detection. Secondly, we investigate the inherently layered structure of representations learned by neural networks, and analyze its role in the context of transfer learning and domain adaptation across visually
dissimilar domains
Learning to see across domains and modalities
Deep learning has recently raised hopes and expectations as a general solution for many applications (computer vision, natural language processing, speech recognition, etc.); indeed it has proven effective, but it also showed a strong dependence on large quantities of data. Generally speaking, deep learning models are especially susceptible to overfitting, due to their large number of internal parameters.
Luckily, it has also been shown that, even when data is scarce, a successful model can be trained by reusing prior knowledge. Thus, developing techniques for \textit{transfer learning} (as this process is known), in its broadest definition, is a crucial element towards the deployment of effective and accurate intelligent systems into the real world.
This thesis will focus on a family of transfer learning methods applied to the task of visual object recognition, specifically image classification. The visual recognition problem is central to computer vision research: many desired applications, from robotics to information retrieval, demand the ability to correctly identify categories, places, and objects.
Transfer learning is a general term, and specific settings have been given specific names: when the learner has access to only unlabeled data from the target domain (where the model should perform) and labeled data from a different domain (the source), the problem is called unsupervised domain adaptation (DA). The first part of this thesis will focus on three methods for this setting.
The three presented techniques for domain adaptation are fully distinct: the first one proposes the use of Domain Alignment layers to structurally align the distributions of the source and target domains in feature space. While the general idea of aligning feature distribution is not novel,
we distinguish our method by being one of the very few that do so without adding losses. The second method is based on GANs: we propose a bidirectional architecture that jointly learns how to map the source images into the target visual style and vice-versa, thus alleviating the domain shift at the pixel level. The third method features an adversarial learning process that transforms both the images and the features of both domains in order to map them to a common, agnostic, space.
While the first part of the thesis presented general purpose DA methods, the second part will focus on the real life issues of robotic perception, specifically RGB-D recognition.
Robotic platforms are usually not limited to color perception; very often they also carry a Depth camera.
Unfortunately, the depth modality is rarely used for visual recognition due to the lack of pretrained models from which to transfer and little data to train one on from scratch.
We will first explore the use of synthetic data as proxy for real images by training a Convolutional Neural Network (CNN) on virtual depth maps, rendered from 3D CAD models, and then testing it on real robotic datasets. The second approach leverages the existence of RGB pretrained models, by learning how to map the depth data into the most discriminative RGB representation and then using existing models for recognition. This second technique is actually a pretty generic Transfer Learning method which can be applied to share knowledge across modalities
Unifying Image Processing as Visual Prompting Question Answering
Image processing is a fundamental task in computer vision, which aims at
enhancing image quality and extracting essential features for subsequent vision
applications. Traditionally, task-specific models are developed for individual
tasks and designing such models requires distinct expertise. Building upon the
success of large language models (LLMs) in natural language processing (NLP),
there is a similar trend in computer vision, which focuses on developing
large-scale models through pretraining and in-context learning. This paradigm
shift reduces the reliance on task-specific models, yielding a powerful unified
model to deal with various tasks. However, these advances have predominantly
concentrated on high-level vision tasks, with less attention paid to low-level
vision tasks. To address this issue, we propose a universal model for general
image processing that covers image restoration, image enhancement, image
feature extraction tasks, etc. Our proposed framework, named PromptGIP, unifies
these diverse image processing tasks within a universal framework. Inspired by
NLP question answering (QA) techniques, we employ a visual prompting question
answering paradigm. Specifically, we treat the input-output image pair as a
structured question-answer sentence, thereby reprogramming the image processing
task as a prompting QA problem. PromptGIP can undertake diverse cross-domain
tasks using provided visual prompts, eliminating the need for task-specific
finetuning. Our methodology offers a universal and adaptive solution to general
image processing. While PromptGIP has demonstrated a certain degree of
out-of-domain task generalization capability, further research is expected to
fully explore its more powerful emergent generalization.Comment: 16 pages, 12 figure
Conditional Invertible Generative Models for Supervised Problems
Invertible neural networks (INNs), in the setting of normalizing flows, are a type of unconditional generative likelihood model. Despite various attractive properties compared to other common generative model types, they are rarely useful for supervised tasks or real applications due to their unguided outputs. In this work, we therefore present three new methods that extend the standard INN setting, falling under a broader category we term generative invertible models. These new methods allow leveraging the theoretical and practical benefits of INNs to solve supervised problems in new ways, including real-world applications from different branches of science. The key finding is that our approaches enhance many aspects of trustworthiness in comparison to conventional feed-forward networks, such as uncertainty estimation and quantification, explainability, and proper handling of outlier data
- …