923 research outputs found

    Leveraging Historical Medical Records as a Proxy via Multimodal Modeling and Visualization to Enrich Medical Diagnostic Learning

    Full text link
    Simulation-based Medical Education (SBME) has been developed as a cost-effective means of enhancing the diagnostic skills of novice physicians and interns, thereby mitigating the need for resource-intensive mentor-apprentice training. However, feedback provided in most SBME is often directed towards improving the operational proficiency of learners, rather than providing summative medical diagnoses that result from experience and time. Additionally, the multimodal nature of medical data during diagnosis poses significant challenges for interns and novice physicians, including the tendency to overlook or over-rely on data from certain modalities, and difficulties in comprehending potential associations between modalities. To address these challenges, we present DiagnosisAssistant, a visual analytics system that leverages historical medical records as a proxy for multimodal modeling and visualization to enhance the learning experience of interns and novice physicians. The system employs elaborately designed visualizations to explore different modality data, offer diagnostic interpretive hints based on the constructed model, and enable comparative analyses of specific patients. Our approach is validated through two case studies and expert interviews, demonstrating its effectiveness in enhancing medical training.Comment: Accepted by IEEE VIS 202

    Autonomous Learning of Speaker Identity and WiFi Geofence From Noisy Sensor Data

    Get PDF
    A fundamental building block towards intelligent environments is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit unique vocal characteristics as people interact with one another in common spaces. However, manually enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. Instead, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation, e.g., sniffed wireless Media Access Control (MAC) addresses, can we learn to associate a specific identity with a particular voiceprint? To address this problem, this paper advocates an Internet of Things (IoT) solution and proposes to use co-located WiFi as supervisory weak labels to automatically bootstrap the labelling process. In particular, a novel cross-modality labelling algorithm is proposed that jointly optimises the clustering and association process, which solves the inherent mismatching issues arising from heterogeneous sensor data. At the same time, we further propose to reuse the labelled data to iteratively update wireless geofence models and curate device specific thresholds. Extensive experimental results from two different scenarios demonstrate that our proposed method is able to achieve 2-fold improvement in labelling compared with conventional methods and can achieve reliable speaker recognition in the wild

    Image based approach for early assessment of heart failure.

    Get PDF
    In diagnosing heart diseases, the estimation of cardiac performance indices requires accurate segmentation of the left ventricle (LV) wall from cine cardiac magnetic resonance (CMR) images. MR imaging is noninvasive and generates clear images; however, it is impractical to manually process the huge number of images generated to calculate the performance indices. In this dissertation, we introduce a novel, fast, robust, bi-directional coupled parametric deformable models that are capable of segmenting the LV wall borders using first- and second-order visual appearance features. These features are embedded in a new stochastic external force that preserves the topology of the LV wall to track the evolution of the parametric deformable models control points. We tested the proposed segmentation approach on 15 data sets in 6 infarction patients using the Dice similarity coefficient (DSC) and the average distance (AD) between the ground truth and automated segmentation contours. Our approach achieves a mean DSC value of 0.926±0.022 and mean AD value of 2.16±0.60 mm compared to two other level set methods that achieve mean DSC values of 0.904±0.033 and 0.885±0.02; and mean AD values of 2.86±1.35 mm and 5.72±4.70 mm, respectively. Also, a novel framework for assessing both 3D functional strain and wall thickening from 4D cine cardiac magnetic resonance imaging (CCMR) is introduced. The introduced approach is primarily based on using geometrical features to track the LV wall during the cardiac cycle. The 4D tracking approach consists of the following two main steps: (i) Initially, the surface points on the LV wall are tracked by solving a 3D Laplace equation between two subsequent LV surfaces; and (ii) Secondly, the locations of the tracked LV surface points are iteratively adjusted through an energy minimization cost function using a generalized Gauss-Markov random field (GGMRF) image model in order to remove inconsistencies and preserve the anatomy of the heart wall during the tracking process. Then the circumferential strains are straight forward calculated from the location of the tracked LV surface points. In addition, myocardial wall thickening is estimated by co-allocation of the corresponding points, or matches between the endocardium and epicardium surfaces of the LV wall using the solution of the 3D laplace equation. Experimental results on in vivo data confirm the accuracy and robustness of our method. Moreover, the comparison results demonstrate that our approach outperforms 2D wall thickening estimation approaches

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy

    Uncertainty Minimization in Robotic 3D Mapping Systems Operating in Dynamic Large-Scale Environments

    Get PDF
    This dissertation research is motivated by the potential and promise of 3D sensing technologies in safety and security applications. With specific focus on unmanned robotic mapping to aid clean-up of hazardous environments, under-vehicle inspection, automatic runway/pavement inspection and modeling of urban environments, we develop modular, multi-sensor, multi-modality robotic 3D imaging prototypes using localization/navigation hardware, laser range scanners and video cameras. While deploying our multi-modality complementary approach to pose and structure recovery in dynamic real-world operating conditions, we observe several data fusion issues that state-of-the-art methodologies are not able to handle. Different bounds on the noise model of heterogeneous sensors, the dynamism of the operating conditions and the interaction of the sensing mechanisms with the environment introduce situations where sensors can intermittently degenerate to accuracy levels lower than their design specification. This observation necessitates the derivation of methods to integrate multi-sensor data considering sensor conflict, performance degradation and potential failure during operation. Our work in this dissertation contributes the derivation of a fault-diagnosis framework inspired by information complexity theory to the data fusion literature. We implement the framework as opportunistic sensing intelligence that is able to evolve a belief policy on the sensors within the multi-agent 3D mapping systems to survive and counter concerns of failure in challenging operating conditions. The implementation of the information-theoretic framework, in addition to eliminating failed/non-functional sensors and avoiding catastrophic fusion, is able to minimize uncertainty during autonomous operation by adaptively deciding to fuse or choose believable sensors. We demonstrate our framework through experiments in multi-sensor robot state localization in large scale dynamic environments and vision-based 3D inference. Our modular hardware and software design of robotic imaging prototypes along with the opportunistic sensing intelligence provides significant improvements towards autonomous accurate photo-realistic 3D mapping and remote visualization of scenes for the motivating applications

    LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

    Full text link
    Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.Comment: Update Appendi

    Cloud-Based Benchmarking of Medical Image Analysis

    Get PDF
    Medical imagin

    Implicit Statistical Learning Across Modalities and its Relationship with Reading in Childhood.

    Get PDF
    Implicit statistical learning (ISL) describes our ability to tacitly pick up regularities from our environment therefore, shaping our behavior. A broad understanding of ISL incorporates a great range of possible computations, which render it highly relevant to reading. In the light of this hypothesized relationship, ISL performance was explored in young (M = 8.47 years) typical readers (N = 31) across three different modalities (i.e., visual, auditory, and tactile) using the Artificial Grammar Learning (AGL) paradigm. Adopting repeated measures and correlational designs, the obtained data revealed modality constraints: (1) above-chance performance was observed on the visual and tactile tasks but not on the auditory task, (2) there was no significant correlation of ISL performance across modalities, and (3) split-half reliability of visual and auditory tasks was reasonably high, yet for the tactile task it was close to zero. Evaluating the relation between ISL ability and language skills, we observed a positive correlation between visual ISL performance and phonological awareness. We discuss these findings in view of current perspectives on the nature of ISL and its potential involvement in mastering successful (i.e., accurate and fluent) reading
    corecore