449 research outputs found

    The importance of interpretability and visualization in ML for medical applications

    Get PDF
    Many areas of science have made a sharp transition towards data-dependent methods, enabled by simultaneous advances in data acquisition and the development of networked system technologies. This is particularly clear in the life sciences, which can be seen as a perfect scenario for the use of machine learning to address problems in which more traditional data analysis approaches might struggle. But this scenario also poses some serious challenges. One of them is the lack interpretability and explainability for complex nonlinear models. In medicine and health care, not addressing such challenge might seriously limit the chances of adoption of these methods. In this summary paper, we pay specific attention to one of the ways in which interpretability and explainability can be addressed in this context: data and model visualizationPeer ReviewedPostprint (published version

    Societal issues concerning the application of artificial intelligence in medicine

    Get PDF
    Medicine is becoming an increasingly data-centred discipline and, beyond classical statistical approaches, artificial intelligence (AI) and, in particular, machine learning (ML) are attracting much interest for the analysis of medical data. It has been argued that AI is experiencing a fast process of commodification. This characterization correctly reflects the current process of industrialization of AI and its reach into society. Therefore, societal issues related to the use of AI and ML should not be ignored any longer and certainly not in the medical domain. These societal issues may take many forms, but they all entail the design of models from a human-centred perspective, incorporating human-relevant requirements and constraints. In this brief paper, we discuss a number of specific issues affecting the use of AI and ML in medicine, such as fairness, privacy and anonymity, explainability and interpretability, but also some broader societal issues, such as ethics and legislation. We reckon that all of these are relevant aspects to consider in order to achieve the objective of fostering acceptance of AI- and ML-based technologies, as well as to comply with an evolving legislation concerning the impact of digital technologies on ethically and privacy sensitive matters. Our specific goal here is to reflect on how all these topics affect medical applications of AI and ML. This paper includes some of the contents of the “2nd Meeting of Science and Dialysis: Artificial Intelligence,” organized in the Bellvitge University Hospital, Barcelona, Spain.Peer ReviewedPostprint (author's final draft

    Bayesian semi non-negative matrix factorisation

    Get PDF
    Non-negative Matrix Factorisation (NMF) has become a standard method for source identification when data, sources and mixing coefficients are constrained to be positive-valued. The method has recently been extended to allow for negative-valued data and sources in the form of Semi-and Convex-NMF. In this paper, we re-elaborate Semi-NMF within a full Bayesian framework. This provides solid foundations for parameter estimation and, importantly, a principled method to address the problem of choosing the most adequate number of sources to describe the observed data. The proposed Bayesian Semi-NMF is preliminarily evaluated here in a real neuro-oncology problem.Peer ReviewedPostprint (published version

    Intelligent data analysis approaches to churn as a business problem: a survey

    Get PDF
    Globalization processes and market deregulation policies are rapidly changing the competitive environments of many economic sectors. The appearance of new competitors and technologies leads to an increase in competition and, with it, a growing preoccupation among service-providing companies with creating stronger customer bonds. In this context, anticipating the customer’s intention to abandon the provider, a phenomenon known as churn, becomes a competitive advantage. Such anticipation can be the result of the correct application of information-based knowledge extraction in the form of business analytics. In particular, the use of intelligent data analysis, or data mining, for the analysis of market surveyed information can be of great assistance to churn management. In this paper, we provide a detailed survey of recent applications of business analytics to churn, with a focus on computational intelligence methods. This is preceded by an in-depth discussion of churn within the context of customer continuity management. The survey is structured according to the stages identified as basic for the building of the predictive models of churn, as well as according to the different types of predictive methods employed and the business areas of their application.Peer ReviewedPostprint (author's final draft

    Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes

    Get PDF
    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft

    Making nonlinear manifold learning models interpretable: The manifold grand tour

    Get PDF
    Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to guide the Grand Tour, increasing the effectiveness of this approach by prioritising the linear views of the data that are most consistent with global data structure in these maps. A further consequence of this approach is to enable direct visualisation of the topographic map onto projective spaces that discern structure in the data. The experimental results on standard databases reported in this paper, using self-organising maps and generative topographic mapping, illustrate the practical value of the proposed approach. The main novelty of our proposal is the definition of a systematic way to guide the search of data views in the grand tour, selecting and prioritizing some of them, based on nonlinear manifold models

    Automated quality control for proton magnetic resonance spectroscopy data using convex non-negative matrix factorization

    Get PDF
    Proton Magnetic Resonance Spectroscopy (1H MRS) has proven its diagnostic potential in a variety of conditions. However, MRS is not yet widely used in clinical routine because of the lack of experts on its diagnostic interpretation. Although data-based decision support systems exist to aid diagnosis, they often take for granted that the data is of good quality, which is not always the case in a real application context. Systems based on models built with bad quality data are likely to underperform in their decision support tasks. In this study, we propose a system to filter out such bad quality data. It is based on convex Non-Negative Matrix Factorization models, used as a dimensionality reduction procedure, and on the use of several classifiers to discriminate between good and bad quality data.Peer ReviewedPostprint (author's final draft

    Using random forests for assistance in the curation of G-protein coupled receptor databases

    Get PDF
    Background: Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. Methods: We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Results: Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. Conclusion: The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has investigated in some detail the consistency of their misclassification using random forest ensemble classifiers. Different sub-families have been shown to display very different discrimination consistency behaviors. The individual identification of consistently misclassified sequences should provide a tool for quality control to GPCR database curators.Peer ReviewedPostprint (published version

    The effect of noise and sample size on an unsupervised feature selection method for manifold learning

    Get PDF
    The research on unsupervised feature selection is scarce in comparison to that for supervised models, despite the fact that this is an important issue for many clustering problems. An unsupervised feature selection method for general Finite Mixture Models was recently proposed and subsequently extended to Generative Topographic Mapping (GTM), a manifold learning constrained mixture model that provides data visualization. Some of the results of a previous partial assessment of this unsupervised feature selection method for GTM suggested that its performance may be affected by insufficient sample size and by noisy data. In this brief study, we test in some detail such limitations of the method.Postprint (published version

    Artificial Intelligence and Dialysis

    Get PDF
    Contemporary medical science heavily relies on the use of technology. Part of this technology strives to im- prove examination and measurement of the human body, with some of the most impressive technical breakthroughs to be found in the development of non-invasive proce- dures. Another part focuses on the development of de- vices that support therapies for specific pathologies. An example of this is the artificial kidney, which has become the target of intensive research from many directions and generates great expectations for dialysis patients. Re- search on the artificial kidney is still incipient though, and there are many challenges that must be overcome before it will become a reality and part of clinical practice in ne- phrology. One of these non-trivial challenges concerns the safety of users of these new dialysis devices. Safety risks make effective monitoring systems mandatory
    corecore