664 research outputs found

    Assesment of Stroke Risk Based on Morphological Ultrasound Image Analysis With Conformal Prediction

    Get PDF
    Non-invasive ultrasound imaging of carotid plaques allows for the development of plaque image analysis in order to assess the risk of stroke. In our work, we provide reliable confidence measures for the assessment of stroke risk, using the Conformal Prediction framework. This framework provides a way for assigning valid confidence measures to predictions of classical machine learning algorithms. We conduct experiments on a dataset which contains morphological features derived from ultrasound images of atherosclerotic carotid plaques, and we evaluate the results of four different Conformal Predictors (CPs). The four CPs are based on Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Naive Bayes classification (NBC), and k-Nearest Neighbours (k-NN). The results given by all CPs demonstrate the reliability and usefulness of the obtained confidence measures on the problem of stroke risk assessment

    Conformal predictors in early diagnostics of ovarian and breast cancers

    Get PDF
    The paper describes an application of a recently developed machine learning technique called Mondrian predictors to risk assessment of ovarian and breast cancers. The analysis is based on mass spectrometry profiling of human serum samples that were collected in the United Kingdom Collaborative Trial of Ovarian Cancer Screening. The paper describes the technique and presents the results of classification (diagnosis) and the corresponding measures of confidence of the diagnostics. The main advantage of this approach is a proven validity of prediction. The paper also describes an approach to improve early diagnosis of ovarian and breast cancers since the data in the United Kingdom Collaborative Trial of Ovarian Cancer Screening were collected over a period of seven years and do allow to make observations of changes in human serum over that period of time. Significance of improvement is confirmed statistically (for up to 11 months for Ovarian Cancer and 9 months for Breast Cancer). In addition, the methodology allowed us to pinpoint the same mass spectrometry peaks as previously detected as carrying statistically significant information for discrimination between healthy and diseased patients. The results are discussed

    TCM-RF : Hedging the predictions of Random Forest

    Get PDF
    The output of traditional classifier is point prediction without giving any confidence of it. To the contrary, Transductive Confidence Machine (TCM), which is a novel framework that provides a prediction result coupled with its accurate confidence. This method also can hedge the prediction in which the predicting accuracy will be controlled by predefined confidence level. In the framework of TCM, the efficiency of prediction depends on the strangeness function of samples. This paper incorporates Random forests (RF) into the framework of TCM and proposes new TCM algorithm named TCM-RF, in which the strangeness obtained by RF will be used to implement the confidence prediction. Compared with traditional TCM algorithms, our method benefits from the more precise and robust strangeness measure and takes advantage of random forest. Experiments indicate its effectiveness and robustness. In addition, our study demonstrated that using ensemble strategies to define sample strangeness may be a more principled way than using a single classifier. On the other hand, it also shows that the paradigm of hedging prediction can be applied to an ensemble classifier

    Applications Of Machine Learning In Biology And Medicine

    Get PDF
    Machine learning as a field is defined to be the set of computational algorithms that improve their performance by assimilating data. As such, the field as a whole has found applications in many diverse disciplines from robotics and communication in engineering to economics and finance, and also biology and medicine. It should not come as a surprise that many popular methods in use today have completely different origins. Despite this heterogeneity, different methods can be divided into standard tasks, such as supervised, unsupervised, semi-supervised and reinforcement learning. Although machine learning as a field can be formalized as methods trying to solve certain standard tasks, applying these tasks on datasets from different fields comes with certain caveats, and sometimes is fraught with challenges. In this thesis, we develop general procedures and novel solutions, dealing with practical problems that arise when modeling biological and medical data. Cost sensitive learning is an important area of research in machine learning which addresses the widespread and practical problem of dealing with different costs during the learning and deployment of classification algorithms. In many applications such as credit fraud detection, network intrusion and specifically medical diagnosis domains, prior class distributions are highly skewed, which makes the training examples very much unbalanced. Combining this with uneven misclassification costs renders standard machine learning approaches useless in learning an acceptable decision function. We experimentally show the benefits and shortcomings of various methods that convert cost blind learning algorithms to cost sensitive ones. Using the results and best practices found for cost sensitive learning, we design and develop a machine learning approach to ontology mapping. Next, we present a novel approach to deal with uncertainty in classification when costs are unknown or otherwise hard to assign. Support Vector Machines (SVM) are considered to be among the most successful approaches for classification. However prediction of instances near the decision boundary depends more on the specific parameter selection or noise in data, rather than a clear difference in features. In many applications such as medical diagnosis, these regions should be labeled as uncertain rather than assigned to any particular class. Furthermore, instances may belong to novel disease subtypes that are not from any previously known class. In such applications, declining to make a prediction could be beneficial when more powerful but expensive tests are available. We develop a novel approach for optimal selection of the threshold and show its successful application on three biological and medical datasets. The last part of this thesis provides novel solutions for handling high dimensional data. Although high-dimensional data is ubiquitously found in many disciplines, current life science research almost always involves high-dimensional genomics/proteomics data. The ``omics\u27\u27 data provide a wealth of information and have changed the research landscape in biology and medicine. However, these data are plagued with noise, redundancy and collinearity, which makes the discovery process very difficult and costly. Any method that can accurately detect irrelevant and noisy variables in omics data would be highly valuable. We present Robust Feature Selection (RFS), a randomized feature selection approach dedicated to low-sample high-dimensional data. RFS combines an embedded feature selection method with a randomization procedure for stability. Recent advances in sparse recovery and estimation methods have provided efficient and asymptotically consistent feature selection algorithms. However, these methods lack finite sample error control due to instability. Furthermore, the chances of correct recovery diminish with more collinearity among features. To overcome these difficulties, RFS uses a randomization procedure to provide an accurate and stable feature selection method. We thoroughly evaluate RFS by comparing it to a number of popular univariate and multivariate feature selection methods and show marked prediction accuracy improvement of a diagnostic signature, while preserving a good stability

    Regulating Black-Box Medicine

    Get PDF
    Data drive modern medicine. And our tools to analyze those data are growing ever more powerful. As health data are collected in greater and greater amounts, sophisticated algorithms based on those data can drive medical innovation, improve the process of care, and increase efficiency. Those algorithms, however, vary widely in quality. Some are accurate and powerful, while others may be riddled with errors or based on faulty science. When an opaque algorithm recommends an insulin dose to a diabetic patient, how do we know that dose is correct? Patients, providers, and insurers face substantial difficulties in identifying high-quality algorithms; they lack both expertise and proprietary information. How should we ensure that medical algorithms are safe and effective? Medical algorithms need regulatory oversight, but that oversight must be appropriately tailored. Unfortunately, the Food and Drug Administration (FDA) has suggested that it will regulate algorithms under its traditional framework, a relatively rigid system that is likely to stifle innovation and to block the development of more flexible, current algorithms. This Article draws upon ideas from the new governance movement to suggest a different path. FDA should pursue a more adaptive regulatory approach with requirements that developers disclose information underlying their algorithms. Disclosure would allow FDA oversight to be supplemented with evaluation by providers, hospitals, and insurers. This collaborative approach would supplement the agency’s review with ongoing real-world feedback from sophisticated market actors. Medical algorithms have tremendous potential, but ensuring that such potential is developed in high-quality ways demands a careful balancing between public and private oversight, and a role for FDA that mediates—but does not dominate—the rapidly developing industry

    Confidence and Venn Machines and Their Applications to Proteomics

    Get PDF
    When a prediction is made in a classification or regression problem, it is useful to have additional information on how reliable this individual prediction is. Such predictions complemented with the additional information are also expected to be valid, i.e., to have a guarantee on the outcome. Recently developed frameworks of confidence machines, category-based confidence machines and Venn machines allow us to address these problems: confidence machines complement each prediction with its confidence and output region predictions with the guaranteed asymptotical error rate; Venn machines output multiprobability predictions which are valid in respect of observed frequencies. Another advantage of these frameworks is the fact that they are based on the i.i.d. assumption and do not depend on the probability distribution of examples. This thesis is devoted to further development of these frameworks. Firstly, novel designs and implementations of confidence machines and Venn machines are proposed. These implementations are based on random forest and support vector machine classifiers and inherit their ability to predict with high accuracy on a certain type of data. Experimental testing is carried out. Secondly, several algorithms with online validity are designed for proteomic data analysis. These algorithms take into account the nature of mass spectrometry experiments and special features of the data analysed. They also allow us to address medical problems: to make early diagnosis of diseases and to identify potential biomarkers. Extensive experimental study is performed on the UK Collaborative Trial of Ovarian Cancer Screening data sets. Finally, in theoretical research we extend the class of algorithms which output valid predictions in the online mode: we develop a new method of constructing valid prediction intervals for a statistical model different from the standard i.i.d. assumption used in confidence and Venn machines
    corecore