923 research outputs found
Leveraging Historical Medical Records as a Proxy via Multimodal Modeling and Visualization to Enrich Medical Diagnostic Learning
Simulation-based Medical Education (SBME) has been developed as a
cost-effective means of enhancing the diagnostic skills of novice physicians
and interns, thereby mitigating the need for resource-intensive
mentor-apprentice training. However, feedback provided in most SBME is often
directed towards improving the operational proficiency of learners, rather than
providing summative medical diagnoses that result from experience and time.
Additionally, the multimodal nature of medical data during diagnosis poses
significant challenges for interns and novice physicians, including the
tendency to overlook or over-rely on data from certain modalities, and
difficulties in comprehending potential associations between modalities. To
address these challenges, we present DiagnosisAssistant, a visual analytics
system that leverages historical medical records as a proxy for multimodal
modeling and visualization to enhance the learning experience of interns and
novice physicians. The system employs elaborately designed visualizations to
explore different modality data, offer diagnostic interpretive hints based on
the constructed model, and enable comparative analyses of specific patients.
Our approach is validated through two case studies and expert interviews,
demonstrating its effectiveness in enhancing medical training.Comment: Accepted by IEEE VIS 202
Autonomous Learning of Speaker Identity and WiFi Geofence From Noisy Sensor Data
A fundamental building block towards intelligent environments is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit unique vocal characteristics as people interact with one another in common spaces. However, manually enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. Instead, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation, e.g., sniffed wireless Media Access Control (MAC) addresses, can we learn to associate a specific identity with a particular voiceprint? To address this problem, this paper advocates an Internet of Things (IoT) solution and proposes to use co-located WiFi as supervisory weak labels to automatically bootstrap the labelling process. In particular, a novel cross-modality labelling algorithm is proposed that jointly optimises the clustering and association process, which solves the inherent mismatching issues arising from heterogeneous sensor data. At the same time, we further propose to reuse the labelled data to iteratively update wireless geofence models and curate device specific thresholds. Extensive experimental results from two different scenarios demonstrate that our proposed method is able to achieve 2-fold improvement in labelling compared with conventional methods and can achieve reliable speaker recognition in the wild
Image based approach for early assessment of heart failure.
In diagnosing heart diseases, the estimation of cardiac performance indices requires accurate segmentation of the left ventricle (LV) wall from cine cardiac magnetic resonance (CMR) images. MR imaging is noninvasive and generates clear images; however, it is impractical to manually process the huge number of images generated to calculate the performance indices. In this dissertation, we introduce a novel, fast, robust, bi-directional coupled parametric deformable models that are capable of segmenting the LV wall borders using first- and second-order visual appearance features. These features are embedded in a new stochastic external force that preserves the topology of the LV wall to track the evolution of the parametric deformable models control points. We tested the proposed segmentation approach on 15 data sets in 6 infarction patients using the Dice similarity coefficient (DSC) and the average distance (AD) between the ground truth and automated segmentation contours. Our approach achieves a mean DSC value of 0.926±0.022 and mean AD value of 2.16±0.60 mm compared to two other level set methods that achieve mean DSC values of 0.904±0.033 and 0.885±0.02; and mean AD values of 2.86±1.35 mm and 5.72±4.70 mm, respectively. Also, a novel framework for assessing both 3D functional strain and wall thickening from 4D cine cardiac magnetic resonance imaging (CCMR) is introduced. The introduced approach is primarily based on using geometrical features to track the LV wall during the cardiac cycle. The 4D tracking approach consists of the following two main steps: (i) Initially, the surface points on the LV wall are tracked by solving a 3D Laplace equation between two subsequent LV surfaces; and (ii) Secondly, the locations of the tracked LV surface points are iteratively adjusted through an energy minimization cost function using a generalized Gauss-Markov random field (GGMRF) image model in order to remove inconsistencies and preserve the anatomy of the heart wall during the tracking process. Then the circumferential strains are straight forward calculated from the location of the tracked LV surface points. In addition, myocardial wall thickening is estimated by co-allocation of the corresponding points, or matches between the endocardium and epicardium surfaces of the LV wall using the solution of the 3D laplace equation. Experimental results on in vivo data confirm the accuracy and robustness of our method. Moreover, the comparison results demonstrate that our approach outperforms 2D wall thickening estimation approaches
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
Uncertainty Minimization in Robotic 3D Mapping Systems Operating in Dynamic Large-Scale Environments
This dissertation research is motivated by the potential and promise of 3D sensing technologies in safety and security applications. With specific focus on unmanned robotic mapping to aid clean-up of hazardous environments, under-vehicle inspection, automatic runway/pavement inspection and modeling of urban environments, we develop modular, multi-sensor, multi-modality robotic 3D imaging prototypes using localization/navigation hardware, laser range scanners and video cameras.
While deploying our multi-modality complementary approach to pose and structure recovery in dynamic real-world operating conditions, we observe several data fusion issues that state-of-the-art methodologies are not able to handle. Different bounds on the noise model of heterogeneous sensors, the dynamism of the operating conditions and the interaction of the sensing mechanisms with the environment introduce situations where sensors can intermittently degenerate to accuracy levels lower than their design specification. This observation necessitates the derivation of methods to integrate multi-sensor data considering sensor conflict, performance degradation and potential failure during operation.
Our work in this dissertation contributes the derivation of a fault-diagnosis framework inspired by information complexity theory to the data fusion literature. We implement the framework as opportunistic sensing intelligence that is able to evolve a belief policy on the sensors within the multi-agent 3D mapping systems to survive and counter concerns of failure in challenging operating conditions. The implementation of the information-theoretic framework, in addition to eliminating failed/non-functional sensors and avoiding catastrophic fusion, is able to minimize uncertainty during autonomous operation by adaptively deciding to fuse or choose believable sensors. We demonstrate our framework through experiments in multi-sensor robot state localization in large scale dynamic environments and vision-based 3D inference. Our modular hardware and software design of robotic imaging prototypes along with the opportunistic sensing intelligence provides significant improvements towards autonomous accurate photo-realistic 3D mapping and remote visualization of scenes for the motivating applications
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
Obtaining large pre-trained models that can be fine-tuned to new tasks with
limited annotated samples has remained an open challenge for medical imaging
data. While pre-trained deep networks on ImageNet and vision-language
foundation models trained on web-scale data are prevailing approaches, their
effectiveness on medical tasks is limited due to the significant domain shift
between natural and medical images. To bridge this gap, we introduce LVM-Med,
the first family of deep networks trained on large-scale medical datasets. We
have collected approximately 1.3 million medical images from 55 publicly
available datasets, covering a large number of organs and modalities such as
CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art
self-supervised algorithms on this dataset and propose a novel self-supervised
contrastive learning algorithm using a graph-matching formulation. The proposed
approach makes three contributions: (i) it integrates prior pair-wise image
similarity metrics based on local and global information; (ii) it captures the
structural constraints of feature embeddings through a loss function
constructed via a combinatorial graph-matching objective; and (iii) it can be
trained efficiently end-to-end using modern gradient-estimation techniques for
black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream
medical tasks ranging from segmentation and classification to object detection,
and both for the in and out-of-distribution settings. LVM-Med empirically
outperforms a number of state-of-the-art supervised, self-supervised, and
foundation models. For challenging tasks such as Brain Tumor Classification or
Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models
trained on 1 billion masks by 6-7% while using only a ResNet-50.Comment: Update Appendi
Cloud-Based Benchmarking of Medical Image Analysis
Medical imagin
Implicit Statistical Learning Across Modalities and its Relationship with Reading in Childhood.
Implicit statistical learning (ISL) describes our ability to tacitly pick up regularities from our environment therefore, shaping our behavior. A broad understanding of ISL incorporates a great range of possible computations, which render it highly relevant to reading. In the light of this hypothesized relationship, ISL performance was explored in young (M = 8.47 years) typical readers (N = 31) across three different modalities (i.e., visual, auditory, and tactile) using the Artificial Grammar Learning (AGL) paradigm. Adopting repeated measures and correlational designs, the obtained data revealed modality constraints: (1) above-chance performance was observed on the visual and tactile tasks but not on the auditory task, (2) there was no significant correlation of ISL performance across modalities, and (3) split-half reliability of visual and auditory tasks was reasonably high, yet for the tactile task it was close to zero. Evaluating the relation between ISL ability and language skills, we observed a positive correlation between visual ISL performance and phonological awareness. We discuss these findings in view of current perspectives on the nature of ISL and its potential involvement in mastering successful (i.e., accurate and fluent) reading
- …