219 research outputs found

    Societal issues concerning the application of artificial intelligence in medicine

    Get PDF
    Medicine is becoming an increasingly data-centred discipline and, beyond classical statistical approaches, artificial intelligence (AI) and, in particular, machine learning (ML) are attracting much interest for the analysis of medical data. It has been argued that AI is experiencing a fast process of commodification. This characterization correctly reflects the current process of industrialization of AI and its reach into society. Therefore, societal issues related to the use of AI and ML should not be ignored any longer and certainly not in the medical domain. These societal issues may take many forms, but they all entail the design of models from a human-centred perspective, incorporating human-relevant requirements and constraints. In this brief paper, we discuss a number of specific issues affecting the use of AI and ML in medicine, such as fairness, privacy and anonymity, explainability and interpretability, but also some broader societal issues, such as ethics and legislation. We reckon that all of these are relevant aspects to consider in order to achieve the objective of fostering acceptance of AI- and ML-based technologies, as well as to comply with an evolving legislation concerning the impact of digital technologies on ethically and privacy sensitive matters. Our specific goal here is to reflect on how all these topics affect medical applications of AI and ML. This paper includes some of the contents of the “2nd Meeting of Science and Dialysis: Artificial Intelligence,” organized in the Bellvitge University Hospital, Barcelona, Spain.Peer ReviewedPostprint (author's final draft

    The importance of interpretability and visualization in ML for medical applications

    Get PDF
    Many areas of science have made a sharp transition towards data-dependent methods, enabled by simultaneous advances in data acquisition and the development of networked system technologies. This is particularly clear in the life sciences, which can be seen as a perfect scenario for the use of machine learning to address problems in which more traditional data analysis approaches might struggle. But this scenario also poses some serious challenges. One of them is the lack interpretability and explainability for complex nonlinear models. In medicine and health care, not addressing such challenge might seriously limit the chances of adoption of these methods. In this summary paper, we pay specific attention to one of the ways in which interpretability and explainability can be addressed in this context: data and model visualizationPeer ReviewedPostprint (published version

    Robust cartogram visualization of outliers in manifold learning

    Get PDF
    Most real data sets contain atypical observations, often referred to as outliers. Their presence may have a negative impact in data modeling using machine learning. This is particularly the case in data density estimation approaches. Manifold learning techniques provide low-dimensional data representations, often oriented towards visualization. The visualization provided by density estimation manifold learning methods can be compromised by the presence of outliers. Recently, a cartogram-based representation of model-generated distortion was presented for nonlinear dimensionality reduction. Here, we investigate the impact of outliers on this visualization when using manifold learning techniques that behave robustly in their presence.Postprint (published version

    The effect of noise and sample size on an unsupervised feature selection method for manifold learning

    Get PDF
    The research on unsupervised feature selection is scarce in comparison to that for supervised models, despite the fact that this is an important issue for many clustering problems. An unsupervised feature selection method for general Finite Mixture Models was recently proposed and subsequently extended to Generative Topographic Mapping (GTM), a manifold learning constrained mixture model that provides data visualization. Some of the results of a previous partial assessment of this unsupervised feature selection method for GTM suggested that its performance may be affected by insufficient sample size and by noisy data. In this brief study, we test in some detail such limitations of the method.Postprint (published version

    Using random forests for assistance in the curation of G-protein coupled receptor databases

    Get PDF
    Background: Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. Methods: We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Results: Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. Conclusion: The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has investigated in some detail the consistency of their misclassification using random forest ensemble classifiers. Different sub-families have been shown to display very different discrimination consistency behaviors. The individual identification of consistently misclassified sequences should provide a tool for quality control to GPCR database curators.Peer ReviewedPostprint (published version

    Preliminary theoretical results on a feature relevance determination method for Generative Topographic Mapping

    Get PDF
    Feature selection (FS) has long been studied in classification and regression problems, following diverse approaches and resulting on a wide variety of methods, usually grouped as either /filters /or /wrappers/. In comparison, FS for unsupervised learning has received far less attention. For many real problems concerning unsupervised multivariate data clustering, FS becomes an issue of paramount importance as results have to meet interpretability and actionability requirements. A FS method for Gaussian mixture models was recently defined in Law et al. (2004). Mixture models are well established as clustering methods, but their multivariate data visualization capabilities are limited. The Generative Topographic Mapping (Bishop et al. 1998a), a constrained mixture of distributions, was originally defined to overcome such limitation. In this brief report we provide the theoretical development of a feature relevance determination method for Generative Topographic Mapping, based on that defined in Law et al. (2004); with this method, the clustering results can be visualized on a low dimensional latent space and interpreted in terms of a reduced subset of selected relevant features. [This documend has been revised (8/11/2006)]Postprint (published version

    Missing data imputation through generative topographic mapping as a mixture of t-distributions: Theoretical developments

    Get PDF
    The Generative Topographic Mapping (GTM) was originally conceived as a probabilistic alternative to the well-known, neural network-inspired, Self-Organizing Map (SOM). The GTM can also be interpreted as a constrained mixture of distributions model. In recent years, much attention has been directed towards Student t-distributions as an alternative to Gaussians in mixture models due to their robustness towards outliers. In this report, the GTM is redefined as a constrained mixture of t-distributions: the t-GTM, and the Expectation-Maximization algorithm that is used to fit the model to the data is modified to provide missing data imputation.Postprint (published version

    Blood pressure assessment with differential pulse transit time and deep learning: a proof of concept

    Get PDF
    Modern clinical environments are laden with technology devices continuously gathering physiological data from patients. This is especially true in critical care environments, where life-saving decisions may have to be made on the basis of signals from monitoring devices. Hemodynamic monitoring is essential in dialysis, surgery, and in critically ill patients. For the most severe patients, blood pressure is normally assessed through a catheter, which is an invasive procedure that may result in adverse effects. Blood pressure can also be monitored noninvasively through different methods and these data can be used for the continuous assessment of pressure using machine learning methods. Previous studies have found pulse transit time to be related to blood pressure. In this short paper, we propose to study the feasibility of implementing a data-driven model based on restricted Boltzmann machine artificial neural networks, delivering a first proof of concept for the validity and viability of a method for blood pressure prediction based on these models.Peer ReviewedPostprint (author's final draft

    Bayesian semi non-negative matrix factorisation

    Get PDF
    Non-negative Matrix Factorisation (NMF) has become a standard method for source identification when data, sources and mixing coefficients are constrained to be positive-valued. The method has recently been extended to allow for negative-valued data and sources in the form of Semi-and Convex-NMF. In this paper, we re-elaborate Semi-NMF within a full Bayesian framework. This provides solid foundations for parameter estimation and, importantly, a principled method to address the problem of choosing the most adequate number of sources to describe the observed data. The proposed Bayesian Semi-NMF is preliminarily evaluated here in a real neuro-oncology problem.Peer ReviewedPostprint (published version

    Capturing the dynamics of multivariate time series through visualization using generative topographic mapping through time

    Get PDF
    Most of the existing research on time series concerns supervised forecasting problems. In comparison, little research has been devoted to unsupervised methods for the visual exploration of multivariate time series. In this paper, the capabilities of the Generative Topographic Mapping Through Time, a model with solid foundations in probability theory that performs simultaneous time series data clustering and visualization, are assessed in detail in several experiments. The focus is placed on the detection of atypical data, the visualization of the evolution of signal regimes, and the exploration of sudden transitions, for which a novel identification index is defined.Postprint (published version
    • …
    corecore