334 research outputs found

    Personalised information modelling technologies for personalised medicine

    Get PDF
    Personalised modelling offers a new and effective approach for the study in pattern recognition and knowledge discovery, especially for biomedical applications. The created models are more useful and informative for analysing and evaluating an individual data object for a given problem. Such models are also expected to achieve a higher degree of accuracy of prediction of outcome or classification than conventional systems and methodologies. Motivated by the concept of personalised medicine and utilising transductive reasoning, personalised modelling was recently proposed as a new method for knowledge discovery in biomedical applications. Personalised modelling aims to create a unique computational diagnostic or prognostic model for an individual. Here we introduce an integrated method for personalised modelling that applies global optimisation of variables (features) and an appropriate size of neighbourhood to create an accurate personalised model for an individual. This method creates an integrated computational system that combines different information processing techniques, applied at different stages of data analysis, e.g. feature selection, classification, discovering the interaction of genes, outcome prediction, personalised profiling and visualisation, etc. It allows for adaptation, monitoring and improvement of an individual’s model and leads to improved accuracy and unique personalised profiling that could be used for personalised treatment and personalised drug design

    Using random forest for reliable classification and cost-sensitive learning for medical diagnosis

    Get PDF
    Background: Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further, the default assumption of equal misclassification costs is most likely violated in medical diagnosis. Results: In this paper, we present a modified random forest classifier which is incorporated into the conformal predictor scheme. A conformal predictor is a transductive learning scheme, using Kolmogorov complexity to test the randomness of a particular sample with respect to the training sets. Our method show well-calibrated property that the performance can be set prior to classification and the accurate rate is exactly equal to the predefined confidence level. Further, to address the cost sensitive problem, we extend our method to a label-conditional predictor which takes into account different costs for misclassifications in different class and allows different confidence level to be specified for each class. Intensive experiments on benchmark datasets and real world applications show the resultant classifier is well-calibrated and able to control the specific risk of different class. Conclusion: The method of using RF outlier measure to design a nonconformity measure benefits the resultant predictor. Further, a label-conditional classifier is developed and turn to be an alternative approach to the cost sensitive learning problem that relies on label-wise predefined confidence level. The target of minimizing the risk of misclassification is achieved by specifying the different confidence level for different class

    Reliable Probabilistic Classification with Neural Networks

    Full text link
    Venn Prediction (VP) is a new machine learning framework for producing well-calibrated probabilistic predictions. In particular it provides well-calibrated lower and upper bounds for the conditional probability of an example belonging to each possible class of the problem at hand. This paper proposes five VP methods based on Neural Networks (NNs), which is one of the most widely used machine learning techniques. The proposed methods are evaluated experimentally on four benchmark datasets and the obtained results demonstrate the empirical well-calibratedness of their outputs and their superiority over the outputs of the traditional NN classifier

    Exploiting the interplay between cross-sectional and longitudinal data in Class III malocclusion patients

    Get PDF
    The aim of the study was to investigate how to improve the forecasting of craniofacial unbalance risk during growth among patients affected by Class III malocclusion. To this purpose we used computational methodologies such as Transductive Learning (TL), Boosting (B), and Feature Engineering (FE) instead of the traditional statistical analysis based on Classification trees and logistic models. Such techniques have been applied to cephalometric data from 728 cross-sectional untreated Class III subjects (6–14 years of age) and from 91 untreated Class III subjects followed longitudinally during the growth process. A cephalometric analysis comprising 11 variables has also been performed. The subjects followed longitudinally were divided into two subgroups: favourable and unfavourable growth, in comparison with normal craniofacial growth. With respect to traditional statistical predictive analytics, TL increased the accuracy in identifying subjects at risk of unfavourable growth. TL algorithm was useful in diffusion of information from longitudinal to cross-sectional subjects. The accuracy in identifying high-risk subjects to growth worsening increased from 63% to 78%. Finally, a further increase in identification accuracy, up to 83%, was produced by FE. A ranking of important variables in identifying subjects at risk of growth worsening, therefore, has been obtained

    Reliable Prediction Intervals with Regression Neural Networks

    Full text link
    This paper proposes an extension to conventional regression Neural Networks (NNs) for replacing the point predictions they produce with prediction intervals that satisfy a required level of confidence. Our approach follows a novel machine learning framework, called Conformal Prediction (CP), for assigning reliable confidence measures to predictions without assuming anything more than that the data are independent and identically distributed (i.i.d.). We evaluate the proposed method on four benchmark datasets and on the problem of predicting Total Electron Content (TEC), which is an important parameter in trans-ionospheric links; for the latter we use a dataset of more than 60000 TEC measurements collected over a period of 11 years. Our experimental results show that the prediction intervals produced by our method are both well-calibrated and tight enough to be useful in practice
    corecore