141 research outputs found

    Top Quark Pair Production beyond NNLO

    Get PDF
    We construct an approximate expression for the total cross section for the production of a heavy quark-antiquark pair in hadronic collisions at next-to-next-to-next-to-leading order (N3^3LO) in αs\alpha_s. We use a technique which exploits the analyticity of the Mellin space cross section, and the information on its singularity structure coming from large N (soft gluon, Sudakov) and small N (high energy, BFKL) all order resummations, previously introduced and used in the case of Higgs production. We validate our method by comparing to available exact results up to NNLO. We find that N3^3LO corrections increase the predicted top pair cross section at the LHC by about 4% over the NNLO.Comment: 34 pages, 9 figures; final version, to be published in JHEP; reference added, minor improvement

    Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most microarray experiments are carried out with the purpose of identifying genes whose expression varies in relation with specific conditions or in response to environmental stimuli. In such studies, genes showing similar mean expression values between two or more groups are considered as not differentially expressed, even if hidden subclasses with different expression values may exist. In this paper we propose a new method for identifying differentially expressed genes, based on the area between the ROC curve and the rising diagonal (<it>ABCR</it>). <it>ABCR </it>represents a more general approach than the standard area under the ROC curve (<it>AUC</it>), because it can identify both proper (<it>i.e.</it>, concave) and not proper ROC curves (NPRC). In particular, NPRC may correspond to those genes that tend to escape standard selection methods.</p> <p>Results</p> <p>We assessed the performance of our method using data from a publicly available database of 4026 genes, including 14 normal B cell samples (NBC) and 20 heterogeneous lymphomas (namely: 9 follicular lymphomas and 11 chronic lymphocytic leukemias). Moreover, NBC also included two sub-classes, <it>i.e.</it>, 6 heavily stimulated and 8 slightly or not stimulated samples. We identified 1607 differentially expressed genes with an estimated False Discovery Rate of 15%. Among them, 16 corresponded to NPRC and all escaped standard selection procedures based on <it>AUC </it>and <it>t </it>statistics. Moreover, a simple inspection to the shape of such plots allowed to identify the two subclasses in either one class in 13 cases (81%).</p> <p>Conclusion</p> <p>NPRC represent a new useful tool for the analysis of microarray data.</p

    CONFIDERAI: a novel CONFormal Interpretable-by-Design score function for Explainable and Reliable Artificial Intelligence

    Full text link
    Everyday life is increasingly influenced by artificial intelligence, and there is no question that machine learning algorithms must be designed to be reliable and trustworthy for everyone. Specifically, computer scientists consider an artificial intelligence system safe and trustworthy if it fulfills five pillars: explainability, robustness, transparency, fairness, and privacy. In addition to these five, we propose a sixth fundamental aspect: conformity, that is, the probabilistic assurance that the system will behave as the machine learner expects. In this paper, we propose a methodology to link conformal prediction with explainable machine learning by defining CONFIDERAI, a new score function for rule-based models that leverages both rules predictive ability and points geometrical position within rules boundaries. We also address the problem of defining regions in the feature space where conformal guarantees are satisfied by exploiting techniques to control the number of non-conformal samples in conformal regions based on support vector data description (SVDD). The overall methodology is tested with promising results on benchmark and real datasets, such as DNS tunneling detection or cardiovascular disease prediction.Comment: 12 pages, 7 figures, 1 algorithm, international journa

    CONFIDERAI: CONFormal Interpretable-by-Design score function for Explainable and Reliable Artificial Intelligence

    Get PDF
    The concept of trustworthiness has been declined in different ways in the field of artificial intelligence, but all its definitions agree on two main pillars: explainability and conformity. In this extended abstract, our aim is to give an idea on how to merge these concepts, by defining a new framework for conformal rule-based predictions. In particular, we introduce a new score function for rule-based models, that leverages on rule relevance and geometrical position of points from rule classification boundaries

    CONFIDERAI: a novel CONFormal Interpretable-by-Design score function for Explainable and Reliable Artificial Intelligence

    Get PDF
    Everyday life is increasingly influenced by artificial intelligence, and there is no question that machine learning algorithms must be designed to be reliable and trustworthy for everyone. Specifically, computer scientists consider an artificial intelligence system safe and trustworthy if it fulfills five pillars: explainability, robustness, transparency, fairness, and privacy. In addition to these five, we propose a sixth fundamental aspect: conformity, that is, the probabilistic assurance that the system will behave as the machine learner expects. In this paper, we propose a methodology to link conformal prediction with explainable machine learning by defining CONFIDERAI, a new score function for rule-based models that leverages both rules predictive ability and points geometrical position within rules boundaries. We also address the problem of defining regions in the feature space where conformal guarantees are satisfied by exploiting techniques to control the number of non-conformal samples in conformal regions based on support vector data description (SVDD). The overall methodology is tested with promising results on benchmark and real datasets, such as DNS tunneling detection or cardiovascular disease prediction

    Weighted Mutual Information for Out-Of-Distribution Detection

    Get PDF
    Out-of-distribution detection has become an important theme in machine learning (ML) field, since the recognition of unseen data either “similar” or not (in- or out-of-distribution) to the ones the ML system has been trained on may lead to potentially fatal consequences. Operational data compliance with the training data has to be verified by the data analyst, who must also understand, in operation, if the autonomous decision-making is still safe or not. In this paper, we study an out-of-distribution (OoD) detection approach based on a rule-based eXplainable Artificial Intelligence (XAI) model. Specifically, the method relies on an innovative metric, i.e., the weighted mutual information, able to capture the different way decision rules are used in case of in- and OoD data

    Gene expression modeling through positive Boolean functions

    Get PDF
    In the framework of gene expression data analysis, the selection of biologically relevant sets of genes and the discovery of new subclasses of diseases at bio-molecular level represent two significant problems. Unfortunately, in both cases the correct solution is usually unknown and the evaluation of the performance of gene selection and clustering methods is difficult and in many cases unfeasible. A natural approach to this complex issue consists in developing an artificial model for the generation of biologically plausible gene expression data, thus allowing to know in advance the set of relevant genes and the functional classes involved in the problem. In this work we propose a mathematical model, based on positive Boolean functions, for the generation of synthetic gene expression data. Despite its simplicity, this model is sufficiently rich to take account of the specific peculiarities of gene expression, including the biological variability, viewed as a sort of random source. As an applicative example, we also provide some data simulations and numerical experiments for the analysis of the performances of gene selection methods

    Clostridium difficile outbreak: epidemiological surveillance, infection prevention and control

    Get PDF
    INTRODUCTION: Clostridium difficile infection (CDI) is currently considered the most common cause of health care-associated infections. The aim is to describe the trend of CDI in an Italian hospital and to assess the efficacy of the measures adopted to manage the burden. METHODS: we looked at CDI from 2016 to 2018. The incidence rate of CDIs was calculated as the number of new infected persons per month by the overall length of stay (incidence per 10,000 patient-days). Changes in the CDI rate during the period considered were analysed using a joinpoint regression model. RESULTS: thanks to the monitoring activity it was possible to adopt a new protocol, in order to manage CDI: the CDI episodes decreased from 85 in 2017 to 31 in 2018 (63% decrease). The joinpoint regression model was a useful tool to identify an important decrement during 2017, statistically significant (slope=-15.84; p= 0.012). CONCLUSIONS: reports based on routine laboratory data can accurately measure population burden of CDI with limited surveillance resources. This acitivity can help target prevention programs and evaluate their effect

    Integrating Machine Learning and Rule-Based Systems for Fraud Detection: A case study based on the Logic Learning Machine

    Get PDF
    Money laundering is one of the most relevant global challenges, with significant repercussions on the economy and international security. Identifying suspicious transactions is a key element in the fight against the phenomenon, but the task is extremely complex due to the constant evolution of the strategies adopted by criminals and the great amount of data to be analyzed daily. This study proposes a hybrid method that integrates Machine Learning models with heuristic rules, with the aim of identifying fraudulent transactions more effectively. The dataset used, SAML, includes millions of bank transactions and presents a strong imbalance between classes (fraudulent vs regular transactions). The entire process was carried out through a self-code platform designed to optimize data management, processing and analysis. The heuristic rules were evaluated using the covering and error metrics and then integrated into the Logic Learning Machine (LLM) task. The effectiveness of the approach was verified by comparing two main configurations: one based exclusively on the use of LLM and the other combining LLM and heuristic rules. The results obtained highlight that the integration of heuristic rules improves the performance of the model, confirming the synergy between Machine Learning and expert knowledge. This study confirms the effectiveness of the hybrid approach and emphasizes the importance of the union between automated analysis and human insight to address the challenges posed by money laundering
    corecore