1,150 research outputs found
ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain
Transformer design is the de facto standard for natural language processing
tasks. The success of the transformer design in natural language processing has
lately piqued the interest of researchers in the domain of computer vision.
When compared to Convolutional Neural Networks (CNNs), Vision Transformers
(ViTs) are becoming more popular and dominant solutions for many vision
problems. Transformer-based models outperform other types of networks, such as
convolutional and recurrent neural networks, in a range of visual benchmarks.
We evaluate various vision transformer models in this work by dividing them
into distinct jobs and examining their benefits and drawbacks. ViTs can
overcome several possible difficulties with convolutional neural networks
(CNNs). The goal of this survey is to show the first use of ViTs in CV. In the
first phase, we categorize various CV applications where ViTs are appropriate.
Image classification, object identification, image segmentation, video
transformer, image denoising, and NAS are all CV applications. Our next step
will be to analyze the state-of-the-art in each area and identify the models
that are currently available. In addition, we outline numerous open research
difficulties as well as prospective research possibilities.Comment: ICCD-2023. arXiv admin note: substantial text overlap with
arXiv:2208.04309 by other author
Industrial X-ray Image Analysis with Deep Neural Networks Robust to Unexpected Input Data
X-ray inspection is often an essential part of quality control within quality critical manufacturing industries. Within such industries, X-ray image interpretation is resource intensive and typically conducted by humans. An increased level of automatization would be preferable, and recent advances in artificial intelligence (e.g., deep learning) have been proposed as solutions. However, typically, such solutions are overconfident when subjected to new data far from the training data, so-called out-of-distribution (OOD) data; we claim that safe automatic interpretation of industrial X-ray images, as part of quality control of critical products, requires a robust confidence estimation with respect to OOD data. We explored if such a confidence estimation, an OOD detector, can be achieved by explicit modeling of the training data distribution, and the accepted images. For this, we derived an autoencoder model trained unsupervised on a public dataset with X-ray images of metal fusion welds and synthetic data. We explicitly demonstrate the dangers with a conventional supervised learning-based approach and compare it to the OOD detector. We achieve true positive rates of around 90% at false positive rates of around 0.1% on samples similar to the training data and correctly detect some example OOD data
Autoencoder with recurrent neural networks for video forgery detection
Video forgery detection is becoming an important issue in recent years,
because modern editing software provide powerful and easy-to-use tools to
manipulate videos. In this paper we propose to perform detection by means of
deep learning, with an architecture based on autoencoders and recurrent neural
networks. A training phase on a few pristine frames allows the autoencoder to
learn an intrinsic model of the source. Then, forged material is singled out as
anomalous, as it does not fit the learned model, and is encoded with a large
reconstruction error. Recursive networks, implemented with the long short-term
memory model, are used to exploit temporal dependencies. Preliminary results on
forged videos show the potential of this approach.Comment: Presented at IS&T Electronic Imaging: Media Watermarking, Security,
and Forensics, January 201
Diffusion Models for Medical Image Analysis: A Comprehensive Survey
Denoising diffusion models, a class of generative models, have garnered
immense interest lately in various deep-learning problems. A diffusion
probabilistic model defines a forward diffusion stage where the input data is
gradually perturbed over several steps by adding Gaussian noise and then learns
to reverse the diffusion process to retrieve the desired noise-free data from
noisy data samples. Diffusion models are widely appreciated for their strong
mode coverage and quality of the generated samples despite their known
computational burdens. Capitalizing on the advances in computer vision, the
field of medical imaging has also observed a growing interest in diffusion
models. To help the researcher navigate this profusion, this survey intends to
provide a comprehensive overview of diffusion models in the discipline of
medical image analysis. Specifically, we introduce the solid theoretical
foundation and fundamental concepts behind diffusion models and the three
generic diffusion modelling frameworks: diffusion probabilistic models,
noise-conditioned score networks, and stochastic differential equations. Then,
we provide a systematic taxonomy of diffusion models in the medical domain and
propose a multi-perspective categorization based on their application, imaging
modality, organ of interest, and algorithms. To this end, we cover extensive
applications of diffusion models in the medical domain. Furthermore, we
emphasize the practical use case of some selected approaches, and then we
discuss the limitations of the diffusion models in the medical domain and
propose several directions to fulfill the demands of this field. Finally, we
gather the overviewed studies with their available open-source implementations
at
https://github.com/amirhossein-kz/Awesome-Diffusion-Models-in-Medical-Imaging.Comment: Second revision: including more papers and further discussion
- …