59 research outputs found

    Ten quick tips for fuzzy logic modeling of biomedical systems

    Get PDF
    Fuzzy logic is useful tool to describe and represent biological or medical scenarios, where often states and outcomes are not only completely true or completely false, but rather partially true or partially false. Despite its usefulness and spread, fuzzy logic modeling might easily be done in the wrong way, especially by beginners and unexperienced researchers, who might overlook some important aspects or might make common mistakes. Malpractices and pitfalls, in turn, can lead to wrong or overoptimistic, inflated results, with negative consequences to the biomedical research community trying to comprehend a particular phenomenon, or even to patients suffering from the investigated disease. To avoid common mistakes, we present here a list of quick tips for fuzzy logic modeling any biomedical scenario: some guidelines which should be taken into account by any fuzzy logic practitioner, including experts. We believe our best practices can have a strong impact in the scientific community, allowing researchers who follow them to obtain better, more reliable results and outcomes in biomedical contexts.</p

    The Matthews correlation coefficient (MCC) is more informative than Cohen's kappa and Brier score in binary classification assessment

    Get PDF
    Even if measuring the outcome of binary classifications is a pivotal task in machine learning and statistics, no consensus has been reached yet about which statistical rate to employ to this end. In the last century, the computer science and statistics communities have introduced several scores summing up the correctness of the predictions with respect to the ground truth values. Among these scores, the Matthews correlation coefficient (MCC) was shown to have several advantages over confusion entropy, accuracy, F 1 score, balanced accuracy, bookmaker informedness, markedness, and diagnostic odds ratio: MCC, in fact, produces a high score only if the majority of the predicted negative data instances and the majority of the positive data instances are correct, and therefore it results being very trustworthy on imbalanced datasets. In this study, we compare MCC with two other popular scores: Cohen’s Kappa, a metric that originated in social sciences, and the Brier score, a strictly proper scoring function which emerged in weather forecasting studies. After explaining the mathematical properties and the relationships between MCC and each of these two rates, we report some use cases where these scores generate different values, which lead to discordant outcomes, where MCC provides a more truthful and informative result. We highlight the reasons why it is more advisable to use MCC rather that Cohen’s Kappa and the Brier score to evaluate binary classifications

    Ten quick tips for fuzzy logic modeling of biomedical systems

    Get PDF
    Fuzzy logic is useful tool to describe and represent biological or medical scenarios, where often states and outcomes are not only completely true or completely false, but rather partially true or partially false. Despite its usefulness and spread, fuzzy logic modeling might easily be done in the wrong way, especially by beginners and unexperienced researchers, who might overlook some important aspects or might make common mistakes. Malpractices and pitfalls, in turn, can lead to wrong or overoptimistic, inflated results, with negative consequences to the biomedical research community trying to comprehend a particular phenomenon, or even to patients suffering from the investigated disease. To avoid common mistakes, we present here a list of quick tips for fuzzy logic modeling any biomedical scenario: some guidelines which should be taken into account by any fuzzy logic practitioner, including experts. We believe our best practices can have a strong impact in the scientific community, allowing researchers who follow them to obtain better, more reliable results and outcomes in biomedical contexts.</p

    The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation

    Get PDF
    Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 and 1), while in regression the target can have multiple values. Even if regression analysis has been employed in a huge number of machine learning studies, no consensus has been reached on a single, unified, standard metric to assess the results of the regression itself. Many studies employ the mean square error (MSE) and its rooted variant (RMSE), or the mean absolute error (MAE) and its percentage variant (MAPE). Although useful, these rates share a common drawback: since their values can range between zero and +infinity, a single value of them does not say much about the performance of the regression with respect to the distribution of the ground truth elements. In this study, we focus on two rates that actually generate a high score only if the majority of the elements of a ground truth group has been correctly predicted: the coefficient of determination (also known as R-squared or R 2) and the symmetric mean absolute percentage error (SMAPE). After showing their mathematical properties, we report a comparison between R 2 and SMAPE in several use cases and in two real medical scenarios. Our results demonstrate that the coefficient of determination (R-squared) is more informative and truthful than SMAPE, and does not have the interpretability limitations of MSE, RMSE, MAE and MAPE. We therefore suggest the usage of R-squared as standard metric to evaluate regression analyses in any scientific domain

    Novelty indicator for enhanced prioritization of predicted gene ontology annotations

    Get PDF
    Biomolecular controlled annotations have become pivotal in computational biology, because they allow scientists to analyze large amounts of biological data to better understand test results, and to infer new knowledge. Yet, biomolecular annotation databases are incomplete by definition, like our knowledge of biology, and might contain errors and inconsistent information. In this context, machine-learning algorithms able to predict and prioritize new annotations are both effective and efficient, especially if compared with time-consuming trials of biological validation. To limit the possibility that these techniques predict obvious and trivial high-level features, and to help prioritize their results, we introduce a new element that can improve accuracy and relevance of the results of an annotation prediction and prioritization pipeline. We propose a novelty indicator able to state the level of “originality” of the annotations predicted for a specific gene to Gene Ontology (GO) terms. This indicator, joint with our previously introduced prediction steps, helps by prioritizing the most novel interesting annotations predicted. We performed an accurate biological functional analysis of the prioritized annotations predicted with high accuracy by our indicator and previously proposed methods. The relevance of our biological findings proves effectiveness and trustworthiness of our indicator and of its prioritization of predicted annotations

    Ten quick tips for machine learning in computational biology

    No full text
    Abstract Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. We believe our ten suggestions can strongly help any machine learning practitioner to carry on a successful project in computational biology and related sciences
    • …
    corecore