Search CORE

21 research outputs found

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

Author: Bailer Werner
Gurrin Cathal
Jónsson Björn Thór
Kovalčík Gregor
Lokoč Jakub
Mejzlík František
Rossetto Luca
Sauter Loris
Schoeffmann Klaus
Song Jaeyub
Souček Tomáš
Veselý Patrik
Vrochidis Stefanos
Wu Jiaxin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2021
Field of study

The IT University of Copenhagen's Repository

ZORA

Whether and When does Endoscopy Domain Pretraining Make Sense?

Author: Batić Dominik
Czempiel Tobias
Holm Felix
Navab Nassir
Özsoy Ege
Publication venue
Publication date: 30/03/2023
Field of study

Automated endoscopy video analysis is a challenging task in medical computer vision, with the primary objective of assisting surgeons during procedures. The difficulty arises from the complexity of surgical scenes and the lack of a sufficient amount of annotated data. In recent years, large-scale pretraining has shown great success in natural language processing and computer vision communities. These approaches reduce the need for annotated data, which is always a concern in the medical domain. However, most works on endoscopic video understanding use models pretrained on natural images, creating a domain gap between pretraining and finetuning. In this work, we investigate the need for endoscopy domain-specific pretraining based on downstream objectives. To this end, we first collect Endo700k, the largest publicly available corpus of endoscopic images, extracted from nine public Minimally Invasive Surgery (MIS) datasets. Endo700k comprises more than 700,000 unannotated raw images. Next, we introduce EndoViT, an endoscopy pretrained Vision Transformer (ViT). Through ablations, we demonstrate that domain-specific pretraining is particularly beneficial for more complex downstream tasks, such as Action Triplet Detection, and less effective and even unnecessary for simpler tasks, such as Surgical Phase Recognition. We will release both our code and pretrained models upon acceptance to facilitate further research in this direction

arXiv.org e-Print Archive

FaceAtt: Enhancing Image Captioning with Facial Attributes for Portrait Images

Author: Akter Sadia
Haque Naimul
Labiba Iffat
Publication venue
Publication date: 24/09/2023
Field of study

Automated image caption generation is a critical area of research that enhances accessibility and understanding of visual content for diverse audiences. In this study, we propose the FaceAtt model, a novel approach to attribute-focused image captioning that emphasizes the accurate depiction of facial attributes within images. FaceAtt automatically detects and describes a wide range of attributes, including emotions, expressions, pointed noses, fair skin tones, hair textures, attractiveness, and approximate age ranges. Leveraging deep learning techniques, we explore the impact of different image feature extraction methods on caption quality and evaluate our model's performance using metrics such as BLEU and METEOR. Our FaceAtt model leverages annotated attributes of portraits as supplementary prior knowledge for our portrait images before captioning. This innovative addition yields a subtle yet discernible enhancement in the resulting scores, exemplifying the potency of incorporating additional attribute vectors during training. Furthermore, our research contributes to the broader discourse on ethical considerations in automated captioning. This study sets the stage for future research in refining attribute-focused captioning techniques, with a focus on enhancing linguistic coherence, addressing biases, and accommodating diverse user needs

arXiv.org e-Print Archive

ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models

Author: Dou Qi
Du Yuhao
Jiang Yuncheng
Li Guanbin
Li Zhen
Tan Shuangyi
Wan Xiang
Wu Xusheng
Publication venue
Publication date: 03/09/2023
Field of study

Colonoscopy analysis, particularly automatic polyp segmentation and detection, is essential for assisting clinical diagnosis and treatment. However, as medical image annotation is labour- and resource-intensive, the scarcity of annotated data limits the effectiveness and generalization of existing methods. Although recent research has focused on data generation and augmentation to address this issue, the quality of the generated data remains a challenge, which limits the contribution to the performance of subsequent tasks. Inspired by the superiority of diffusion models in fitting data distributions and generating high-quality data, in this paper, we propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks. Specifically, ArSDM utilizes the ground-truth segmentation mask as a prior condition during training and adjusts the diffusion loss for each input according to the polyp/background size ratio. Furthermore, ArSDM incorporates a pre-trained segmentation model to refine the training process by reducing the difference between the ground-truth mask and the prediction mask. Extensive experiments on segmentation and detection tasks demonstrate the generated data by ArSDM could significantly boost the performance of baseline methods.Comment: Accepted by MICCAI-202

arXiv.org e-Print Archive

S $^2$ ME: Spatial-Spectral Mutual Teaching and Ensemble Learning for Scribble-supervised Polyp Segmentation

Author: Islam Mobarakol
Ren Hongliang
Wang An
Xu Mengya
Zhang Yang
Publication venue
Publication date: 01/06/2023
Field of study

Fully-supervised polyp segmentation has accomplished significant triumphs over the years in advancing the early diagnosis of colorectal cancer. However, label-efficient solutions from weak supervision like scribbles are rarely explored yet primarily meaningful and demanding in medical practice due to the expensiveness and scarcity of densely-annotated polyp data. Besides, various deployment issues, including data shifts and corruption, put forward further requests for model generalization and robustness. To address these concerns, we design a framework of Spatial-Spectral Dual-branch Mutual Teaching and Entropy-guided Pseudo Label Ensemble Learning (S

^2

ME). Concretely, for the first time in weakly-supervised medical image segmentation, we promote the dual-branch co-teaching framework by leveraging the intrinsic complementarity of features extracted from the spatial and spectral domains and encouraging cross-space consistency through collaborative optimization. Furthermore, to produce reliable mixed pseudo labels, which enhance the effectiveness of ensemble learning, we introduce a novel adaptive pixel-wise fusion technique based on the entropy guidance from the spatial and spectral branches. Our strategy efficiently mitigates the deleterious effects of uncertainty and noise present in pseudo labels and surpasses previous alternatives in terms of efficacy. Ultimately, we formulate a holistic optimization objective to learn from the hybrid supervision of scribbles and pseudo labels. Extensive experiments and evaluation on four public datasets demonstrate the superiority of our method regarding in-distribution accuracy, out-of-distribution generalization, and robustness, highlighting its promising clinical significance. Our code is available at https://github.com/lofrienger/S2ME.Comment: MICCAI 2023 Early Acceptanc

arXiv.org e-Print Archive

Mask-conditioned latent diffusion for generating gastrointestinal polyp images

Author: Halvorsen Pål
Macháček Roman
Mozaffari Leila
Parasa Sravanthi
Riegler Michael A.
Sepasdar Zahra
Thambawita Vajira
Publication venue
Publication date: 11/04/2023
Field of study

In order to take advantage of AI solutions in endoscopy diagnostics, we must overcome the issue of limited annotations. These limitations are caused by the high privacy concerns in the medical field and the requirement of getting aid from experts for the time-consuming and costly medical data annotation process. In computer vision, image synthesis has made a significant contribution in recent years as a result of the progress of generative adversarial networks (GANs) and diffusion probabilistic models (DPM). Novel DPMs have outperformed GANs in text, image, and video generation tasks. Therefore, this study proposes a conditional DPM framework to generate synthetic GI polyp images conditioned on given generated segmentation masks. Our experimental results show that our system can generate an unlimited number of high-fidelity synthetic polyp images with the corresponding ground truth masks of polyps. To test the usefulness of the generated data, we trained binary image segmentation models to study the effect of using synthetic data. Results show that the best micro-imagewise IOU of 0.7751 was achieved from DeepLabv3+ when the training data consists of both real data and synthetic data. However, the results reflect that achieving good segmentation performance with synthetic data heavily depends on model architectures

arXiv.org e-Print Archive

DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

Author: Kong Weikun
Ma Jianhua
Nguyen Le-Minh
Pan Yizhi
Racharak Teeradaj
Sun Guanqun
Xin Junyi
Xu Zichang
Publication venue
Publication date: 14/11/2023
Field of study

Accurate medical image segmentation is critical for disease quantification and treatment evaluation. While traditional Unet architectures and their transformer-integrated variants excel in automated segmentation tasks. However, they lack the ability to harness the intrinsic position and channel features of image. Existing models also struggle with parameter efficiency and computational complexity, often due to the extensive use of Transformers. To address these issues, this study proposes a novel deep medical image segmentation framework, called DA-TransUNet, aiming to integrate the Transformer and dual attention block(DA-Block) into the traditional U-shaped architecture. Unlike earlier transformer-based U-net models, DA-TransUNet utilizes Transformers and DA-Block to integrate not only global and local features, but also image-specific positional and channel features, improving the performance of medical image segmentation. By incorporating a DA-Block at the embedding layer and within each skip connection layer, we substantially enhance feature extraction capabilities and improve the efficiency of the encoder-decoder structure. DA-TransUNet demonstrates superior performance in medical image segmentation tasks, consistently outperforming state-of-the-art techniques across multiple datasets. In summary, DA-TransUNet offers a significant advancement in medical image segmentation, providing an effective and powerful alternative to existing techniques. Our architecture stands out for its ability to improve segmentation accuracy, thereby advancing the field of automated medical image diagnostics. The codes and parameters of our model will be publicly available at https://github.com/SUN-1024/DA-TransUnet

arXiv.org e-Print Archive

Multi-level feature fusion network combining attention mechanisms for polyp segmentation

Author: Chen Qiaosong
Liu Junzhuo
Wang Jin
Wang Zhixiang
Xin Deng
Zhang Ye
Publication venue
Publication date: 24/09/2023
Field of study

Clinically, automated polyp segmentation techniques have the potential to significantly improve the efficiency and accuracy of medical diagnosis, thereby reducing the risk of colorectal cancer in patients. Unfortunately, existing methods suffer from two significant weaknesses that can impact the accuracy of segmentation. Firstly, features extracted by encoders are not adequately filtered and utilized. Secondly, semantic conflicts and information redundancy caused by feature fusion are not attended to. To overcome these limitations, we propose a novel approach for polyp segmentation, named MLFF-Net, which leverages multi-level feature fusion and attention mechanisms. Specifically, MLFF-Net comprises three modules: Multi-scale Attention Module (MAM), High-level Feature Enhancement Module (HFEM), and Global Attention Module (GAM). Among these, MAM is used to extract multi-scale information and polyp details from the shallow output of the encoder. In HFEM, the deep features of the encoders complement each other by aggregation. Meanwhile, the attention mechanism redistributes the weight of the aggregated features, weakening the conflicting redundant parts and highlighting the information useful to the task. GAM combines features from the encoder and decoder features, as well as computes global dependencies to prevent receptive field locality. Experimental results on five public datasets show that the proposed method not only can segment multiple types of polyps but also has advantages over current state-of-the-art methods in both accuracy and generalization ability

arXiv.org e-Print Archive

Recommended from our members

Supervised contrastive learning with identity-label embeddings for facial action unit recognition

Author: Adama D
Lian T
Machado P
Vinkemeier D
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 20/11/2023
Field of study

Facial expression analysis is a crucial area of research for understanding human emotions. One important approach to this is the automatic detection of facial action units (AUs), which are small, visible changes in facial appearance. Despite extensive research, automatic AU detection remains a challenging computer vision problem. This paper addresses two central difficulties: the first is the inherent differences in facial behaviour and appearance across individuals, which leads current AU recognition models to overfit subjects in the training set and generalize poorly to unseen subjects; the second is representing the complex interactions among different AUs. In this paper, we propose a novel two-stage training framework, called CL-ILE, to address these long-standing challenges. In the first stage of CL-ILE, we introduce identity-label embeddings (ILEs) to train an ID feature encoder capable of generating person-specific feature embeddings for input face images. In the second stage, we present a data-driven method that implicitly models the relationships among AUs using contrastive loss in a supervised setting while eliminating the person-specific features generated by the first stage to enhance generalizability. By removing the ID feature encoder and ILEs from the first stage after training, CL-ILE becomes more lightweight and readily applicable to real-world applications than models using graph-based structures. We evaluate our approach on two widely-used AU recognition datasets, BP4D and DISFA, demonstrating that CL-ILE can achieve state-of-the-art performance on the F1 score

Nottingham Trent Institutional Repository (IRep)

Ambient Assisted Living: Scoping Review of Artificial Intelligence Models, Domains, Technology, and Concerns

Author: Colantonio Sara
Flórez-Revuelta Francisco
Jovanovic Mladjan
Kampel Martin
Lameski Petre
Mitrov Goran
Tellioglu Hilda
Zdravevski Eftim
Publication venue: 'JMIR Publications Inc.'
Publication date: 01/11/2022
Field of study

Background: Ambient assisted living (AAL) is a common name for various artificial intelligence (AI)—infused applications and platforms that support their users in need in multiple activities, from health to daily living. These systems use different approaches to learn about their users and make automated decisions, known as AI models, for personalizing their services and increasing outcomes. Given the numerous systems developed and deployed for people with different needs, health conditions, and dispositions toward the technology, it is critical to obtain clear and comprehensive insights concerning AI models used, along with their domains, technology, and concerns, to identify promising directions for future work. Objective: This study aimed to provide a scoping review of the literature on AI models in AAL. In particular, we analyzed specific AI models used in AАL systems, the target domains of the models, the technology using the models, and the major concerns from the end-user perspective. Our goal was to consolidate research on this topic and inform end users, health care professionals and providers, researchers, and practitioners in developing, deploying, and evaluating future intelligent AAL systems. Methods: This study was conducted as a scoping review to identify, analyze, and extract the relevant literature. It used a natural language processing toolkit to retrieve the article corpus for an efficient and comprehensive automated literature search. Relevant articles were then extracted from the corpus and analyzed manually. This review included 5 digital libraries: IEEE, PubMed, Springer, Elsevier, and MDPI. Results: We included a total of 108 articles. The annual distribution of relevant articles showed a growing trend for all categories from January 2010 to July 2022. The AI models mainly used unsupervised and semisupervised approaches. The leading models are deep learning, natural language processing, instance-based learning, and clustering. Activity assistance and recognition were the most common target domains of the models. Ambient sensing, mobile technology, and robotic devices mainly implemented the models. Older adults were the primary beneficiaries, followed by patients and frail persons of various ages. Availability was a top beneficiary concern. Conclusions: This study presents the analytical evidence of AI models in AAL and their domains, technologies, beneficiaries, and concerns. Future research on intelligent AAL should involve health care professionals and caregivers as designers and users, comply with health-related regulations, improve transparency and privacy, integrate with health care technological infrastructure, explain their decisions to the users, and establish evaluation metrics and design guidelines. Trial Registration: PROSPERO (International Prospective Register of Systematic Reviews) CRD42022347590; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42022347590This work was part of and supported by GoodBrother, COST Action 19121—Network on Privacy-Aware Audio- and Video-Based Applications for Active and Assisted Living

Repositorio Institucional de la Universidad de Alicante

Directory of Open Access Journals

PubMed Central