1,150 research outputs found

    ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain

    Full text link
    Transformer design is the de facto standard for natural language processing tasks. The success of the transformer design in natural language processing has lately piqued the interest of researchers in the domain of computer vision. When compared to Convolutional Neural Networks (CNNs), Vision Transformers (ViTs) are becoming more popular and dominant solutions for many vision problems. Transformer-based models outperform other types of networks, such as convolutional and recurrent neural networks, in a range of visual benchmarks. We evaluate various vision transformer models in this work by dividing them into distinct jobs and examining their benefits and drawbacks. ViTs can overcome several possible difficulties with convolutional neural networks (CNNs). The goal of this survey is to show the first use of ViTs in CV. In the first phase, we categorize various CV applications where ViTs are appropriate. Image classification, object identification, image segmentation, video transformer, image denoising, and NAS are all CV applications. Our next step will be to analyze the state-of-the-art in each area and identify the models that are currently available. In addition, we outline numerous open research difficulties as well as prospective research possibilities.Comment: ICCD-2023. arXiv admin note: substantial text overlap with arXiv:2208.04309 by other author

    Industrial X-ray Image Analysis with Deep Neural Networks Robust to Unexpected Input Data

    Get PDF
    X-ray inspection is often an essential part of quality control within quality critical manufacturing industries. Within such industries, X-ray image interpretation is resource intensive and typically conducted by humans. An increased level of automatization would be preferable, and recent advances in artificial intelligence (e.g., deep learning) have been proposed as solutions. However, typically, such solutions are overconfident when subjected to new data far from the training data, so-called out-of-distribution (OOD) data; we claim that safe automatic interpretation of industrial X-ray images, as part of quality control of critical products, requires a robust confidence estimation with respect to OOD data. We explored if such a confidence estimation, an OOD detector, can be achieved by explicit modeling of the training data distribution, and the accepted images. For this, we derived an autoencoder model trained unsupervised on a public dataset with X-ray images of metal fusion welds and synthetic data. We explicitly demonstrate the dangers with a conventional supervised learning-based approach and compare it to the OOD detector. We achieve true positive rates of around 90% at false positive rates of around 0.1% on samples similar to the training data and correctly detect some example OOD data

    Autoencoder with recurrent neural networks for video forgery detection

    Full text link
    Video forgery detection is becoming an important issue in recent years, because modern editing software provide powerful and easy-to-use tools to manipulate videos. In this paper we propose to perform detection by means of deep learning, with an architecture based on autoencoders and recurrent neural networks. A training phase on a few pristine frames allows the autoencoder to learn an intrinsic model of the source. Then, forged material is singled out as anomalous, as it does not fit the learned model, and is encoded with a large reconstruction error. Recursive networks, implemented with the long short-term memory model, are used to exploit temporal dependencies. Preliminary results on forged videos show the potential of this approach.Comment: Presented at IS&T Electronic Imaging: Media Watermarking, Security, and Forensics, January 201

    Diffusion Models for Medical Image Analysis: A Comprehensive Survey

    Full text link
    Denoising diffusion models, a class of generative models, have garnered immense interest lately in various deep-learning problems. A diffusion probabilistic model defines a forward diffusion stage where the input data is gradually perturbed over several steps by adding Gaussian noise and then learns to reverse the diffusion process to retrieve the desired noise-free data from noisy data samples. Diffusion models are widely appreciated for their strong mode coverage and quality of the generated samples despite their known computational burdens. Capitalizing on the advances in computer vision, the field of medical imaging has also observed a growing interest in diffusion models. To help the researcher navigate this profusion, this survey intends to provide a comprehensive overview of diffusion models in the discipline of medical image analysis. Specifically, we introduce the solid theoretical foundation and fundamental concepts behind diffusion models and the three generic diffusion modelling frameworks: diffusion probabilistic models, noise-conditioned score networks, and stochastic differential equations. Then, we provide a systematic taxonomy of diffusion models in the medical domain and propose a multi-perspective categorization based on their application, imaging modality, organ of interest, and algorithms. To this end, we cover extensive applications of diffusion models in the medical domain. Furthermore, we emphasize the practical use case of some selected approaches, and then we discuss the limitations of the diffusion models in the medical domain and propose several directions to fulfill the demands of this field. Finally, we gather the overviewed studies with their available open-source implementations at https://github.com/amirhossein-kz/Awesome-Diffusion-Models-in-Medical-Imaging.Comment: Second revision: including more papers and further discussion
    • …
    corecore