209,447 research outputs found
The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing
Distance-based tests, also called "energy statistics", are leading methods
for two-sample and independence tests from the statistics community.
Kernel-based tests, developed from "kernel mean embeddings", are leading
methods for two-sample and independence tests from the machine learning
community. A fixed-point transformation was previously proposed to connect the
distance methods and kernel methods for the population statistics. In this
paper, we propose a new bijective transformation between metrics and kernels.
It simplifies the fixed-point transformation, inherits similar theoretical
properties, allows distance methods to be exactly the same as kernel methods
for sample statistics and p-value, and better preserves the data structure upon
transformation. Our results further advance the understanding in distance and
kernel-based tests, streamline the code base for implementing these tests, and
enable a rich literature of distance-based and kernel-based methodologies to
directly communicate with each other.Comment: 24 pages main + 7 pages appendix, 3 figure
The use and application of performance metrics with regional climate models
Abstract
This thesis aims to assess and develop objective and robust approaches to evaluate
regional climate model (RCM) historical skill using performance metrics and to
provide guidance to relevant groups as to how best utilise these metrics. Performance
metrics are quantitative, scalar measures of the numerical distance, or
’error’, between historical model simulations and observations. Model evaluation
practice tends to involve ad hoc approaches with little consideration to the underlying
sensitivity of the method to small changes in approach. The main questions
that arise are to what degree are the outputs, and subsequent applications, of these
performance metrics robust?
ENSEMBLES and CORDEX RCMs covering Europe are used with E-OBS
observational data to assess historical and future simulation characteristics using a
range of performance metrics. Metric sensitivity is found in some cases to be low,
such as differences between variable types, with extreme indices often producing
redundant information. In other cases sensitivity is large, particularly for temporal
statistics, but not for spatial pattern statistics. Assessments made over a single
decade are found to be robust with respect to the full 40-year time period.
Two applications of metrics are considered: metric combinations and exploration
of the stationarity of historical RCM bias characteristics. The sensitivity of
metric combination procedure is found to be low with respect to the combination
method and potentially high for the type of metric included, but remains uncertain
for the number of metrics included. Stationarity of biases appears to be highly
dependent on the potential for underlying causes of model bias to change substantially
in the future, such as the case of surface albedo in the Alps.
It is concluded that performance metrics and their applications can and should
be considered more systematically using a range of redundancy and stationarity
tests as indicators of historical and future robustness
Improved metrics collection and correlation for the CERN cloud storage test framework
Storage space is one of the most important ingredients that the European Organization for Nuclear Research (CERN) needs for its experiments and operation. Part of the Data & Storage Services (IT-DSS) group’s work at CERN is focused on testing and evaluating the cloud storage system that is provided by the openlab partner Huawei, Huawei Universal Disk Storage System (UDS). As a whole, the system consists of both software and hardware.
The objective of the Huawei-CERN partnership is to investigate the performance of the cloud storage system. Among the interesting questions are the system’s scalability, reliability and ability to store and retrieve files. During the tests, possible bugs and malfunctions can be discovered and corrected. Different versions of the storage software that runs inside the storage system can also be compared to each other.
The nature of testing and benchmarking a storage system gives rise to several small tasks that can be done during a short summer internship. In order to test the storage system a test framework developed by the DSS group is used. The framework consists of various types of file transfer tests, client and server monitoring programs and log file analysis programs. Part of the work done was additions to the existing framework and part was developing new tools. Metrics collection was the central theme. Metrics are to be understood as system statistics, such as memory consumption or processor usage.
Memory usage and disk reads/writes were added to the existing client real-time monitoring framework. CPU and memory usage, network traffic (bytes received/sent) and the number of processes running are collected from a client computer before and after a daily test. Two other additions are visualization for storage system log files, as well as a new monitoring tool for the storage system. This report is divided into parts describing each part of the framework that was improved or added, the problem and the final solution. A short description of the code and the architecture are also included
Recommended from our members
Quality and Publication of Emergency Medicine Trials Registered in ClinicalTrials.gov
Introduction: Promoting emergency medicine (EM) clinical trials research remains a priority. To characterize the status of clinical EM research, this study assessed trial quality, funding source, and publication of EM clinical trials and compared EM and non-EM trials on these key metrics. We also examined the volume of EM trials and their subspecialty areas.Methods: We abstracted data from ClinicalTrials.gov (February 2000 - September 2013) and used individual study National Clinical Trial numbers to identify published trials (January 2007 - September 2016). We used descriptive statistics and chi-square tests to examine study characteristics by EM and non-EM status, and Kaplan-Meier curves and log-rank tests to compare time to publication of completed EM and non-EM studies.Results: We found 638 interventional EM trials and 59,512 non-EM interventional trials conducted in the United States between February 2000 and September 2013, registered on ClinicalTrials.gov. EM studies were significantly less likely than non-EM studies to be National Institutes of Health-funded or to evaluate a drug or biologic. However, EM studies had significantly larger sample sizes, and were significantly more likely to use randomization and blinding. Overall, 34.3% of EM and 26.0% of non-EM studies were published in peer-reviewed journals. By subspecialty, more EM trials concerned medical/surgical and psychiatric/neurological conditions than trauma.Conclusion: Although EM studies were less likely to have received federal or industry funding, and the EM portfolio consisted of only 638 trials over the 14-year study period, the quality of EM trials surpassed that of non-EM trials, based on indices such as randomization and blinding. This novel finding bodes well for the future of clinical EM research, as does the higher proportion of published EM than non-EM trials. Our study also revealed that trauma studies were under-represented among EM studies. Periodic assessment of EM trials with the metrics used here could provide an informative and valuable longitudinal view of progress in clinical EM research
Classifier selection with permutation tests
This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft
An investigation of biases in Patient Safety Indicator score distribution among hospital cohorts
Denman Research Forum- 2nd Place, Health Professions-ClinicalThe Centers for Medicare and Medicaid Services (CMS) have implemented a hospital reimbursement system that incentivizes payment proportional to the quality of care delivered and performance on certain metrics. One such metric is the Agency for Healthcare Research and Quality’s Patient Safety Indicator 90 (PSI-90). It is composed of eight individual indicators designed to flag adverse patient events that are potentially preventable, such as post-operative wound dehiscence and accidental lacerations. CMS publicly reports four of these individual PSI scores (6, 12, 14 and 15) in addition to the composite PSI-90. Previous studies question the PSIs’ validity beyond screening purposes and furthermore question the underlying administrative data’s ability to accurately and reliably flag such events. This study looks to analyze biases in PSI score distribution for hospitals depending on teaching status, differences in patient demographics and lastly, interactions between teaching status and patient demographic factors and their ability to account for differences in PSI rates. Significant differences were found between teaching and non-teaching hospitals for PSIs 6, 12, 15 and 90 (p<0.01). Inpatient volume and patient severity (p<0.01) were found to be significantly different between teaching status cohorts. Lastly, significant differences in PSI scores were found between patient severity quartiles for PSI 6, 15 and 90 (p<0.05) and between socio-economic quartiles for PSI 6, 12, 15 and 90 (p<0.05); but interaction between patient severity and teaching status was only significant for PSI 90 (p<0.05) and between socioeconomic and teaching statuses for PSI 6 (p<0.05). These results indicate current PSI score distributions may be biased against teaching hospitals for 4 out of 5 PSI measures. Further studies will involve assessing the adequacy of risk-adjustment methodology for PSI metrics. Until then, use of PSI metrics to determine federal reimbursement can lead to bias against teaching hospitals.A three-year embargo was granted for this item.Academic Major: Health Information Management and System
- …