104 research outputs found

    Comparing Multi-objective and Threshold-moving ROC Curve Generation for a Prototype-based Classifier

    Get PDF
    Proceedings of: GECCO 2013: 15th International Conference on Genetic and Evolutionary Computation Conference (Amsterdam, The Netherlands, July 06-10, 2013): a recombination of the 22nd International Conference on Genetic Algorithms (ICGA) and the 18th Annual Genetic Programming Conference (GP), Amsterdam, The Netherlands, July 06-10, 2013Receiver Operating Characteristics (ROC) curves represent the performance of a classifier for all possible operating con-ditions, i.e., for all preferences regarding the tradeoff be-tween false positives and false negatives. The generation of a ROC curve generally involves the training of a single classifier for a given set of operating conditions, with the subsequent use of threshold-moving to obtain a complete ROC curve. Recent work has shown that the generation of ROC curves may also be formulated as a multi-objective optimization problem in ROC space: the goals to be min-imized are the false positive and false negative rates. This technique also produces a single ROC curve, but the curve may derive from operating points for a number of different classifiers. This paper aims to provide an empirical compar-ison of the performance of both of the above approaches, for the specific case of prototype-based classifiers. Results on synthetic and real domains shows a performance advantage for the multi-objective approach.GECCO 2013 Presentation slidesThis work has been funded by the Spanish Ministry of Science under contract TIN2011-28336 (MOVES project)En prens

    Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction

    Get PDF
    Motivation: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies

    Multiobjective optimization in bioinformatics and computational biology

    Get PDF

    On the modelling and impact of negative edges in graph convolutional networks for node classification

    Get PDF
    Signed graphs are important data structures to simultaneously express positive and negative relationships. Their application ranges from structural health monitoring to financial models, where the meaning and properties of negative relationships can play a significant role. In this paper, we provide a comprehensive examination of existing approaches for the integration of signed edges into the Graph Convolutional Network (GCN) framework for node classification. Here, we use a combination of theoretical and empirical analysis to gain a deeper understanding of the strengths and limitations of different mechanisms and to identify areas for possible improvement. We compare six different approaches to the integration of negative link information within the framework of the simple GCN. In particular, we analyze sensitivity towards feature noise, negative edge noise and positive edge noise, as well as robustness towards feature scaling and translation, explaining the results obtained on the basis of individual model assumptions and biases. Our findings highlight the importance of capturing the meaning of negative links in a given domain context, and appropriately reflecting it in the choice of GCN model. Our code is available at https://github.com/dinhtrang24/Signed-GCN

    Towards a fairer reimbursement system for burn patients using cost-sensitive classification

    Get PDF
    The adoption of the Prospective Payment System (PPS) in the UK National Health Service (NHS) has led to the creation of patient groups called Health Resource Groups (HRG). HRGs aim to identify groups of clinically similar patients that share similar resource usage for reimbursement purposes. These groups are predominantly identified based on expert advice, with homogeneity checked using the length of stay (LOS). However, for complex patients such as those encountered in burn care, LOS is not a perfect proxy of resource usage, leading to incomplete homogeneity checks. To improve homogeneity in resource usage and severity, we propose a data-driven model and the inclusion of patient-level costing. We investigate whether a data-driven approach that considers additional measures of resource usage can lead to a more comprehensive model. In particular, a cost-sensitive decision tree model is adopted to identify features of importance and rules that allow for a focused segmentation on resource usage (LOS and patient-level cost) and clinical similarity (severity of burn). The proposed approach identified groups with increased homogeneity compared to the current HRG groups, allowing for a more equitable reimbursement of hospital care costs if adopted.Comment: Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare: State of XAI and trustworthiness in Healt
    corecore