665 research outputs found
TelsNet: temporal lesion network embedding in a transformer model to detect cervical cancer through colposcope images
Cervical cancer ranks as the fourth most prevalent malignancy among women globally. Timely identification and intervention in cases of cervical cancer hold the potential for achieving complete remission and cure. In this study, we built a deep learning model based on self-attention mechanism using transformer architecture to classify the cervix images to help in diagnosis of cervical cancer. We have used techniques like an enhanced multivariate gaussian mixture model optimized with mexican axolotl algorithm for segmenting the colposcope images prior to the Temporal Lesion Convolution Neural Network (TelsNet) classifying the images. TelsNet is a transformer-based neural network that uses temporal convolutional neural networks to identify cancerous regions in colposcope images. Our experiments show that TelsNet achieved an accuracy of 92.7%, with a sensitivity of 73.4% and a specificity of 82.1%. We compared the performance of our model with various state-of-the-art methods, and our results demonstrate that TelsNet outperformed the other methods. The findings have the potential to significantly simplify the process of detecting and accurately classifying cervical cancers at an early stage, leading to improved rates of remission and better overall outcomes for patients globally
Artificial Intelligence-Based Methods for Fusion of Electronic Health Records and Imaging Data
Healthcare data are inherently multimodal, including electronic health
records (EHR), medical images, and multi-omics data. Combining these multimodal
data sources contributes to a better understanding of human health and provides
optimal personalized healthcare. Advances in artificial intelligence (AI)
technologies, particularly machine learning (ML), enable the fusion of these
different data modalities to provide multimodal insights. To this end, in this
scoping review, we focus on synthesizing and analyzing the literature that uses
AI techniques to fuse multimodal medical data for different clinical
applications. More specifically, we focus on studies that only fused EHR with
medical imaging data to develop various AI methods for clinical applications.
We present a comprehensive analysis of the various fusion strategies, the
diseases and clinical outcomes for which multimodal fusion was used, the ML
algorithms used to perform multimodal fusion for each clinical application, and
the available multimodal medical datasets. We followed the PRISMA-ScR
guidelines. We searched Embase, PubMed, Scopus, and Google Scholar to retrieve
relevant studies. We extracted data from 34 studies that fulfilled the
inclusion criteria. In our analysis, a typical workflow was observed: feeding
raw data, fusing different data modalities by applying conventional machine
learning (ML) or deep learning (DL) algorithms, and finally, evaluating the
multimodal fusion through clinical outcome predictions. Specifically, early
fusion was the most used technique in most applications for multimodal learning
(22 out of 34 studies). We found that multimodality fusion models outperformed
traditional single-modality models for the same task. Disease diagnosis and
prediction were the most common clinical outcomes (reported in 20 and 10
studies, respectively) from a clinical outcome perspective.Comment: Accepted in Nature Scientific Reports. 20 page
TandemNet: Distilling Knowledge from Medical Images Using Diagnostic Reports as Optional Semantic References
In this paper, we introduce the semantic knowledge of medical images from
their diagnostic reports to provide an inspirational network training and an
interpretable prediction mechanism with our proposed novel multimodal neural
network, namely TandemNet. Inside TandemNet, a language model is used to
represent report text, which cooperates with the image model in a tandem
scheme. We propose a novel dual-attention model that facilitates high-level
interactions between visual and semantic information and effectively distills
useful features for prediction. In the testing stage, TandemNet can make
accurate image prediction with an optional report text input. It also
interprets its prediction by producing attention on the image and text
informative feature pieces, and further generating diagnostic report
paragraphs. Based on a pathological bladder cancer images and their diagnostic
reports (BCIDR) dataset, sufficient experiments demonstrate that our method
effectively learns and integrates knowledge from multimodalities and obtains
significantly improved performance than comparing baselines.Comment: MICCAI2017 Ora
SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor Segmentation in PET/CT Images
Radiotherapy (RT) combined with cetuximab is the standard treatment for
patients with inoperable head and neck cancers. Segmentation of head and neck
(H&N) tumors is a prerequisite for radiotherapy planning but a time-consuming
process. In recent years, deep convolutional neural networks have become the de
facto standard for automated image segmentation. However, due to the expensive
computational cost associated with enlarging the field of view in DCNNs, their
ability to model long-range dependency is still limited, and this can result in
sub-optimal segmentation performance for objects with background context
spanning over long distances. On the other hand, Transformer models have
demonstrated excellent capabilities in capturing such long-range information in
several semantic segmentation tasks performed on medical images. Inspired by
the recent success of Vision Transformers and advances in multi-modal image
analysis, we propose a novel segmentation model, debuted, Cross-Modal Swin
Transformer (SwinCross), with cross-modal attention (CMA) module to incorporate
cross-modal feature extraction at multiple resolutions.To validate the
effectiveness of the proposed method, we performed experiments on the HECKTOR
2021 challenge dataset and compared it with the nnU-Net (the backbone of the
top-5 methods in HECKTOR 2021) and other state-of-the-art transformer-based
methods such as UNETR, and Swin UNETR. The proposed method is experimentally
shown to outperform these comparing methods thanks to the ability of the CMA
module to capture better inter-modality complimentary feature representations
between PET and CT, for the task of head-and-neck tumor segmentation.Comment: 9 pages, 3 figures. Med Phys. 202
- …