6 research outputs found
Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data
The recent introduction of synthetic correlated diffusion (CDI) imaging
has demonstrated significant potential in the realm of clinical decision
support for prostate cancer (PCa). CDI is a new form of magnetic resonance
imaging (MRI) designed to characterize tissue characteristics through the joint
correlation of diffusion signal attenuation across different Brownian motion
sensitivities. Despite the performance improvement, the CDI data for PCa
has not been previously made publicly available. In our commitment to advance
research efforts for PCa, we introduce Cancer-Net PCa-Data, an open-source
benchmark dataset of volumetric CDI imaging data of PCa patients.
Cancer-Net PCa-Data consists of CDI volumetric images from a patient cohort
of 200 patient cases, along with full annotations (gland masks, tumor masks,
and PCa diagnosis for each tumor). We also analyze the demographic and label
region diversity of Cancer-Net PCa-Data for potential biases. Cancer-Net
PCa-Data is the first-ever public dataset of CDI imaging data for PCa, and
is a part of the global open-source initiative dedicated to advancement in
machine learning and imaging research to aid clinicians in the global fight
against cancer
COVIDx CXR-4: An Expanded Multi-Institutional Open-Source Benchmark Dataset for Chest X-ray Image-Based Computer-Aided COVID-19 Diagnostics
The global ramifications of the COVID-19 pandemic remain significant,
exerting persistent pressure on nations even three years after its initial
outbreak. Deep learning models have shown promise in improving COVID-19
diagnostics but require diverse and larger-scale datasets to improve
performance. In this paper, we introduce COVIDx CXR-4, an expanded
multi-institutional open-source benchmark dataset for chest X-ray image-based
computer-aided COVID-19 diagnostics. COVIDx CXR-4 expands significantly on the
previous COVIDx CXR-3 dataset by increasing the total patient cohort size by
greater than 2.66 times, resulting in 84,818 images from 45,342 patients across
multiple institutions. We provide extensive analysis on the diversity of the
patient demographic, imaging metadata, and disease distributions to highlight
potential dataset biases. To the best of the authors' knowledge, COVIDx CXR-4
is the largest and most diverse open-source COVID-19 CXR dataset and is made
publicly available as part of an open initiative to advance research to aid
clinicians against the COVID-19 disease
Double-Condensing Attention Condenser: Leveraging Attention in Deep Learning to Detect Skin Cancer from Skin Lesion Images
Skin cancer is the most common type of cancer in the United States and is
estimated to affect one in five Americans. Recent advances have demonstrated
strong performance on skin cancer detection, as exemplified by state of the art
performance in the SIIM-ISIC Melanoma Classification Challenge; however these
solutions leverage ensembles of complex deep neural architectures requiring
immense storage and compute costs, and therefore may not be tractable. A recent
movement for TinyML applications is integrating Double-Condensing Attention
Condensers (DC-AC) into a self-attention neural network backbone architecture
to allow for faster and more efficient computation. This paper explores
leveraging an efficient self-attention structure to detect skin cancer in skin
lesion images and introduces a deep neural network design with DC-AC customized
for skin cancer detection from skin lesion images. The final model is publicly
available as a part of a global open-source initiative dedicated to
accelerating advancement in machine learning to aid clinicians in the fight
against cancer
Cancer-Net PCa-Gen: Synthesis of Realistic Prostate Diffusion Weighted Imaging Data via Anatomic-Conditional Controlled Latent Diffusion
In Canada, prostate cancer is the most common form of cancer in men and
accounted for 20% of new cancer cases for this demographic in 2022. Due to
recent successes in leveraging machine learning for clinical decision support,
there has been significant interest in the development of deep neural networks
for prostate cancer diagnosis, prognosis, and treatment planning using
diffusion weighted imaging (DWI) data. A major challenge hindering widespread
adoption in clinical use is poor generalization of such networks due to
scarcity of large-scale, diverse, balanced prostate imaging datasets for
training such networks. In this study, we explore the efficacy of latent
diffusion for generating realistic prostate DWI data through the introduction
of an anatomic-conditional controlled latent diffusion strategy. To the best of
the authors' knowledge, this is the first study to leverage conditioning for
synthesis of prostate cancer imaging. Experimental results show that the
proposed strategy, which we call Cancer-Net PCa-Gen, enhances synthesis of
diverse prostate images through controllable tumour locations and better
anatomical and textural fidelity. These crucial features make it well-suited
for augmenting real patient data, enabling neural networks to be trained on a
more diverse and comprehensive data distribution. The Cancer-Net PCa-Gen
framework and sample images have been made publicly available at
https://www.kaggle.com/datasets/deetsadi/cancer-net-pca-gen-dataset as a part
of a global open-source initiative dedicated to accelerating advancement in
machine learning to aid clinicians in the fight against cancer
NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches
Accurate dietary intake estimation is critical for informing policies and
programs to support healthy eating, as malnutrition has been directly linked to
decreased quality of life. However self-reporting methods such as food diaries
suffer from substantial bias. Other conventional dietary assessment techniques
and emerging alternative approaches such as mobile applications incur high time
costs and may necessitate trained personnel. Recent work has focused on using
computer vision and machine learning to automatically estimate dietary intake
from food images, but the lack of comprehensive datasets with diverse
viewpoints, modalities and food annotations hinders the accuracy and realism of
such methods. To address this limitation, we introduce NutritionVerse-Synth,
the first large-scale dataset of 84,984 photorealistic synthetic 2D food images
with associated dietary information and multimodal annotations (including depth
images, instance masks, and semantic masks). Additionally, we collect a real
image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to
evaluate realism. Leveraging these novel datasets, we develop and benchmark
NutritionVerse, an empirical study of various dietary intake estimation
approaches, including indirect segmentation-based and direct prediction
networks. We further fine-tune models pretrained on synthetic data with real
images to provide insights into the fusion of synthetic and real data. Finally,
we release both datasets (NutritionVerse-Synth, NutritionVerse-Real) on
https://www.kaggle.com/nutritionverse/datasets as part of an open initiative to
accelerate machine learning for dietary sensing
Foodverse: A Dataset of 3D Food Models for Nutritional Intake Estimation
77% of adults over 50 want to age in place today, presenting a major challenge of ensuring adequate nutritional intake. Recent advancements in machine learning and computer vision show promise of automated tracking methods, but require a large high-quality dataset to have accurate performance. Existing datasets comprise of 2D images with discretely sampled camera views, unrepresentative of the different angles and quality taken by older individuals. By leveraging view synthesis for 3D models, an infinite number of 2D images can be generated for any given viewpoint/camera angle. In this paper, we develop a methodology for collecting high-quality 3D models for food items with a particular focus on speed and consistency, and introduce Foodverse, a large-scale high-quality high-resolution multimodal dataset of 52 3D food models, in conjunction with their associated weight, food name, language description, and nutritional value. We also demonstrate 2D view synthesis using these 3D food models