12 research outputs found

    Robust Conditional Independence maps of single-voxel Magnetic Resonance Spectra to elucidate associations between brain tumours and metabolites.

    Get PDF
    The aim of the paper is two-fold. First, we show that structure finding with the PC algorithm can be inherently unstable and requires further operational constraints in order to consistently obtain models that are faithful to the data. We propose a methodology to stabilise the structure finding process, minimising both false positive and false negative error rates. This is demonstrated with synthetic data. Second, to apply the proposed structure finding methodology to a data set comprising single-voxel Magnetic Resonance Spectra of normal brain and three classes of brain tumours, to elucidate the associations between brain tumour types and a range of observed metabolites that are known to be relevant for their characterisation. The data set is bootstrapped in order to maximise the robustness of feature selection for nominated target variables. Specifically, Conditional Independence maps (CI-maps) built from the data and their derived Bayesian networks have been used. A Directed Acyclic Graph (DAG) is built from CI-maps, being a major challenge the minimization of errors in the graph structure. This work presents empirical evidence on how to reduce false positive errors via the False Discovery Rate, and how to identify appropriate parameter settings to improve the False Negative Reduction. In addition, several node ordering policies are investigated that transform the graph into a DAG. The obtained results show that ordering nodes by strength of mutual information can recover a representative DAG in a reasonable time, although a more accurate graph can be recovered using a random order of samples at the expense of increasing the computation time

    Quantum clustering in non-spherical data distributions: Finding a suitable number of clusters

    Get PDF
    Quantum Clustering (QC) provides an alternative approach to clustering algorithms, several of which are based on geometric relationships between data points. Instead, QC makes use of quantum mechanics concepts to find structures (clusters) in data sets by finding the minima of a quantum potential. The starting point of QC is a Parzen estimator with a fixed length scale, which significantly affects the final cluster allocation. This dependence on an adjustable parameter is common to other methods. We propose a framework to find suitable values of the length parameter σ by optimising twin measures of cluster separation and consistency for a given cluster number. This is an extension of the Separation and Concordance framework previously introduced for K-means clustering. Experimental results on two synthetic data sets and three challenging real-world data sets show that optimisation of cluster separation identifies QC solutions with consistently high Jaccard score measured against true-cluster labels while optimisation of cluster consistency provides insights into hierarchical cluster structure. © 2017 Elsevier B.V

    How to find simple and accurate rules for viral protease cleavage specificities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way.</p> <p>Results</p> <p>A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods.</p> <p>Conclusion</p> <p>A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.</p

    Development of a Rule Based Prognostic Tool for HER 2 Positive Breast Cancer Patients

    No full text
    International audienceA three stage development process for the production of a hierarchical rule based prognosis tool is described. The application for this tool is specific to breast cancer patients that have a positive expression of the HER 2 gene. The first stage is the development of a Bayesian classification neural network to classify for cancer specific mortality. Secondly, low-order Boolean rules are extracted form this model using an orthogonal search based rule extraction (OSRE) algorithm. Further to these rules additional information is gathered from the Kaplan-Meier survival estimates of the population, stratified by the categorizations of the input variables. Finally, expert knowledge is used to further simplify the rules and to rank them hierarchically in the form of a decision tree. The resulting decision tree groups all observations into specific categories by clinical profile and by event rate. The practical clinical value of this decision support tool will in future be tested by external validation with additional data from other clinical centres
    corecore