153 research outputs found
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
Explainable artificial intelligence (XAI) in deep learning-based medical image analysis
With an increase in deep learning-based methods, the call for explainability
of such methods grows, especially in high-stakes decision making areas such as
medical image analysis. This survey presents an overview of eXplainable
Artificial Intelligence (XAI) used in deep learning-based medical image
analysis. A framework of XAI criteria is introduced to classify deep
learning-based medical image analysis methods. Papers on XAI techniques in
medical image analysis are then surveyed and categorized according to the
framework and according to anatomical location. The paper concludes with an
outlook of future opportunities for XAI in medical image analysis.Comment: Submitted for publication. Comments welcome by email to first autho
Recommended from our members
Robust Machine Learning by Integrating Context
Intelligent software has the potential to transform our society. It is becoming the building block for many systems in the real world. However, despite the excellent performance of machine learning models on benchmarks, state-of-the-art methods like neural networks often fail once they encounter realistic settings. Since neural networks often learn correlations without reasoning with the right signals and knowledge, they fail when facing shifting distributions, unforeseen corruptions, and worst-case scenarios. Since neural networks are black-box models, they are not interpretable or trusted by the user. We need to build robust models for machine learning to be confidently and responsibly deployed in the most critical applications and systems.
In this dissertation, I introduce our robust machine learning systems advancements by tightly integrating context into algorithms. The context has two aspects: the intrinsic structure of natural data, and the extrinsic structure from domain knowledge. Both are crucial: By capitalizing on the intrinsic structure in natural data, my work has shown that we can create robust machine learning systems, even in the worst case, an analytical result that also enjoys strong empirical gains.
Through integrating external knowledge, such as the association between tasks and causal structure, my framework can instruct models to use the right signals for inference, enabling new opportunities for controllable and interpretable models.
This thesis consists of three parts. In the first part, I aim to cover three works that use the intrinsic structure as a constraint to achieve robust inference. I present our framework that performs test-time optimization to respect the natural constraint, which is captured by self-supervised tasks. I illustrate that test-time optimization improves out-of-distribution generalization and adversarial robustness. Besides the inference algorithm, I show that intrinsic structure through discrete representations also improves out-of-distribution robustness.
In the second part of the thesis, I then detail my work using external domain knowledge. I first introduce using causal structure from external domain knowledge to improve domain generalization robustness. I then show how the association of multiple tasks and regularization objectives helps robustness.
In the final part of this dissertation, I show three works on trustworthy and reliable foundation models, a general-purpose model that will be the foundation for many AI applications. I show a framework that uses context to secure, interpret, and control foundation models
Multimodal Adversarial Learning
Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art
- …