209,447 research outputs found

    The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

    Full text link
    Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from "kernel mean embeddings", are leading methods for two-sample and independence tests from the machine learning community. A fixed-point transformation was previously proposed to connect the distance methods and kernel methods for the population statistics. In this paper, we propose a new bijective transformation between metrics and kernels. It simplifies the fixed-point transformation, inherits similar theoretical properties, allows distance methods to be exactly the same as kernel methods for sample statistics and p-value, and better preserves the data structure upon transformation. Our results further advance the understanding in distance and kernel-based tests, streamline the code base for implementing these tests, and enable a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.Comment: 24 pages main + 7 pages appendix, 3 figure

    The use and application of performance metrics with regional climate models

    Get PDF
    Abstract This thesis aims to assess and develop objective and robust approaches to evaluate regional climate model (RCM) historical skill using performance metrics and to provide guidance to relevant groups as to how best utilise these metrics. Performance metrics are quantitative, scalar measures of the numerical distance, or ’error’, between historical model simulations and observations. Model evaluation practice tends to involve ad hoc approaches with little consideration to the underlying sensitivity of the method to small changes in approach. The main questions that arise are to what degree are the outputs, and subsequent applications, of these performance metrics robust? ENSEMBLES and CORDEX RCMs covering Europe are used with E-OBS observational data to assess historical and future simulation characteristics using a range of performance metrics. Metric sensitivity is found in some cases to be low, such as differences between variable types, with extreme indices often producing redundant information. In other cases sensitivity is large, particularly for temporal statistics, but not for spatial pattern statistics. Assessments made over a single decade are found to be robust with respect to the full 40-year time period. Two applications of metrics are considered: metric combinations and exploration of the stationarity of historical RCM bias characteristics. The sensitivity of metric combination procedure is found to be low with respect to the combination method and potentially high for the type of metric included, but remains uncertain for the number of metrics included. Stationarity of biases appears to be highly dependent on the potential for underlying causes of model bias to change substantially in the future, such as the case of surface albedo in the Alps. It is concluded that performance metrics and their applications can and should be considered more systematically using a range of redundancy and stationarity tests as indicators of historical and future robustness

    Improved metrics collection and correlation for the CERN cloud storage test framework

    Get PDF
    Storage space is one of the most important ingredients that the European Organization for Nuclear Research (CERN) needs for its experiments and operation. Part of the Data & Storage Services (IT-DSS) group’s work at CERN is focused on testing and evaluating the cloud storage system that is provided by the openlab partner Huawei, Huawei Universal Disk Storage System (UDS). As a whole, the system consists of both software and hardware. The objective of the Huawei-CERN partnership is to investigate the performance of the cloud storage system. Among the interesting questions are the system’s scalability, reliability and ability to store and retrieve files. During the tests, possible bugs and malfunctions can be discovered and corrected. Different versions of the storage software that runs inside the storage system can also be compared to each other. The nature of testing and benchmarking a storage system gives rise to several small tasks that can be done during a short summer internship. In order to test the storage system a test framework developed by the DSS group is used. The framework consists of various types of file transfer tests, client and server monitoring programs and log file analysis programs. Part of the work done was additions to the existing framework and part was developing new tools. Metrics collection was the central theme. Metrics are to be understood as system statistics, such as memory consumption or processor usage. Memory usage and disk reads/writes were added to the existing client real-time monitoring framework. CPU and memory usage, network traffic (bytes received/sent) and the number of processes running are collected from a client computer before and after a daily test. Two other additions are visualization for storage system log files, as well as a new monitoring tool for the storage system. This report is divided into parts describing each part of the framework that was improved or added, the problem and the final solution. A short description of the code and the architecture are also included

    Classifier selection with permutation tests

    Get PDF
    This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft

    An investigation of biases in Patient Safety Indicator score distribution among hospital cohorts

    Get PDF
    Denman Research Forum- 2nd Place, Health Professions-ClinicalThe Centers for Medicare and Medicaid Services (CMS) have implemented a hospital reimbursement system that incentivizes payment proportional to the quality of care delivered and performance on certain metrics. One such metric is the Agency for Healthcare Research and Quality’s Patient Safety Indicator 90 (PSI-90). It is composed of eight individual indicators designed to flag adverse patient events that are potentially preventable, such as post-operative wound dehiscence and accidental lacerations. CMS publicly reports four of these individual PSI scores (6, 12, 14 and 15) in addition to the composite PSI-90. Previous studies question the PSIs’ validity beyond screening purposes and furthermore question the underlying administrative data’s ability to accurately and reliably flag such events. This study looks to analyze biases in PSI score distribution for hospitals depending on teaching status, differences in patient demographics and lastly, interactions between teaching status and patient demographic factors and their ability to account for differences in PSI rates. Significant differences were found between teaching and non-teaching hospitals for PSIs 6, 12, 15 and 90 (p<0.01). Inpatient volume and patient severity (p<0.01) were found to be significantly different between teaching status cohorts. Lastly, significant differences in PSI scores were found between patient severity quartiles for PSI 6, 15 and 90 (p<0.05) and between socio-economic quartiles for PSI 6, 12, 15 and 90 (p<0.05); but interaction between patient severity and teaching status was only significant for PSI 90 (p<0.05) and between socioeconomic and teaching statuses for PSI 6 (p<0.05). These results indicate current PSI score distributions may be biased against teaching hospitals for 4 out of 5 PSI measures. Further studies will involve assessing the adequacy of risk-adjustment methodology for PSI metrics. Until then, use of PSI metrics to determine federal reimbursement can lead to bias against teaching hospitals.A three-year embargo was granted for this item.Academic Major: Health Information Management and System
    • …
    corecore