753 research outputs found

    Multimodal Machine Learning for Automated ICD Coding

    Full text link
    This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD-10 codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.Comment: Machine Learning for Healthcare 201

    Multimodal machine learning for intelligent mobility

    Get PDF
    Scientific problems are solved by finding the optimal solution for a specific task. Some problems can be solved analytically while other problems are solved using data driven methods. The use of digital technologies to improve the transportation of people and goods, which is referred to as intelligent mobility, is one of the principal beneficiaries of data driven solutions. Autonomous vehicles are at the heart of the developments that propel Intelligent Mobility. Due to the high dimensionality and complexities involved in real-world environments, it needs to become commonplace for intelligent mobility to use data-driven solutions. As it is near impossible to program decision making logic for every eventuality manually. While recent developments of data-driven solutions such as deep learning facilitate machines to learn effectively from large datasets, the application of techniques within safety-critical systems such as driverless cars remain scarce.Autonomous vehicles need to be able to make context-driven decisions autonomously in different environments in which they operate. The recent literature on driverless vehicle research is heavily focused only on road or highway environments but have discounted pedestrianized areas and indoor environments. These unstructured environments tend to have more clutter and change rapidly over time. Therefore, for intelligent mobility to make a significant impact on human life, it is vital to extend the application beyond the structured environments. To further advance intelligent mobility, researchers need to take cues from multiple sensor streams, and multiple machine learning algorithms so that decisions can be robust and reliable. Only then will machines indeed be able to operate in unstructured and dynamic environments safely. Towards addressing these limitations, this thesis investigates data driven solutions towards crucial building blocks in intelligent mobility. Specifically, the thesis investigates multimodal sensor data fusion, machine learning, multimodal deep representation learning and its application of intelligent mobility. This work demonstrates that mobile robots can use multimodal machine learning to derive driver policy and therefore make autonomous decisions.To facilitate autonomous decisions necessary to derive safe driving algorithms, we present an algorithm for free space detection and human activity recognition. Driving these decision-making algorithms are specific datasets collected throughout this study. They include the Loughborough London Autonomous Vehicle dataset, and the Loughborough London Human Activity Recognition dataset. The datasets were collected using an autonomous platform design and developed in house as part of this research activity. The proposed framework for Free-Space Detection is based on an active learning paradigm that leverages the relative uncertainty of multimodal sensor data streams (ultrasound and camera). It utilizes an online learning methodology to continuously update the learnt model whenever the vehicle experiences new environments. The proposed Free Space Detection algorithm enables an autonomous vehicle to self-learn, evolve and adapt to new environments never encountered before. The results illustrate that online learning mechanism is superior to one-off training of deep neural networks that require large datasets to generalize to unfamiliar surroundings. The thesis takes the view that human should be at the centre of any technological development related to artificial intelligence. It is imperative within the spectrum of intelligent mobility where an autonomous vehicle should be aware of what humans are doing in its vicinity. Towards improving the robustness of human activity recognition, this thesis proposes a novel algorithm that classifies point-cloud data originated from Light Detection and Ranging sensors. The proposed algorithm leverages multimodality by using the camera data to identify humans and segment the region of interest in point cloud data. The corresponding 3-dimensional data was converted to a Fisher Vector Representation before being classified by a deep Convolutional Neural Network. The proposed algorithm classifies the indoor activities performed by a human subject with an average precision of 90.3%. When compared to an alternative point cloud classifier, PointNet[1], [2], the proposed framework out preformed on all classes. The developed autonomous testbed for data collection and algorithm validation, as well as the multimodal data-driven solutions for driverless cars, is the major contributions of this thesis. It is anticipated that these results and the testbed will have significant implications on the future of intelligent mobility by amplifying the developments of intelligent driverless vehicles.</div

    Multimodal Sparse Coding for Event Detection

    Full text link
    Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature learning methods such as GMM supervectors and sparse RBM. We report the cross-validated classification accuracy and mean average precision of the MED system trained on features learned from our unimodal and multimodal settings for a subset of the TRECVID MED 2014 dataset.Comment: Multimodal Machine Learning Workshop at NIPS 201

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy

    Multi-agent system for multimodal machine learning object detection

    Get PDF
    Multi-agent systems have shown great promise in addressing complex problems that traditional single-agent approaches are not be able to handle. In this article, we propose a multi-agent system for the conception of a multimodal machine learning problem on edge devices. Our architecture leverages docker containers to encapsulate knowledge in the form of models and processes, enabling easy management of the system. Communication between agents is facilitated by Message Queuing Telemetry Transport, a lightweight messaging protocol ideal for Internet of Things and edge computing environments. Additionally, we highlight the significance of object detection in our proposed system, which is a crucial component of many multimodal machine learning tasks, by enabling the identification and localization of objects within diverse data modalities. In this manuscript an overall architecture description is performed, discussing the role of each agent and the communication protocol between them. The proposed system offers a general approach to multimodal machine learning problems on edge devices, demonstrating the advantages of multi-agent systems in handling complex and dynamic environments.This work has been supported by FCT (Fundação para a Ciência e Tecnologia) within the R&D Units Project Scope: UIDB/00319/2020

    Music emotion recognition: a multimodal machine learning approach

    Get PDF
    Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performanc

    Multimodal Machine Learning-based Knee Osteoarthritis Progression Prediction from Plain Radiographs and Clinical Data

    Get PDF
    Knee osteoarthritis (OA) is the most common musculoskeletal disease without a cure, and current treatment options are limited to symptomatic relief. Prediction of OA progression is a very challenging and timely issue, and it could, if resolved, accelerate the disease modifying drug development and ultimately help to prevent millions of total joint replacement surgeries performed annually. Here, we present a multi-modal machine learning-based OA progression prediction model that utilizes raw radiographic data, clinical examination results and previous medical history of the patient. We validated this approach on an independent test set of 3,918 knee images from 2,129 subjects. Our method yielded area under the ROC curve (AUC) of 0.79 (0.78-0.81) and Average Precision (AP) of 0.68 (0.66-0.70). In contrast, a reference approach, based on logistic regression, yielded AUC of 0.75 (0.74-0.77) and AP of 0.62 (0.60-0.64). The proposed method could significantly improve the subject selection process for OA drug-development trials and help the development of personalized therapeutic plans

    Hurricane Forecasting: A Novel Multimodal Machine Learning Framework

    Full text link
    This paper describes a machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple distinct ML techniques and utilizing diverse data sources. Our framework, which we refer to as Hurricast (HURR), is built upon the combination of distinct data processing techniques using gradient-boosted trees and novel encoder-decoder architectures, including CNN, GRU and Transformers components. We propose a deep-feature extractor methodology to mix spatial-temporal data with statistical data efficiently. Our multimodal framework unleashes the potential of making forecasts based on a wide range of data sources, including historical storm data, and visual data such as reanalysis atmospheric images. We evaluate our models with current operational forecasts in North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time, and show our models consistently outperform statistical-dynamical models and compete with the best dynamical models, while computing forecasts in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model leads to a significant improvement of 5% - 15% over NHC's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that combining different data sources and distinct machine learning methodologies can lead to superior tropical cyclone forecasting. We hope that this work opens the door for further use of machine learning in meteorological forecasting.Comment: Under revision by the AMS' Weather and Forecasting journa
    • …
    corecore