145,428 research outputs found
Performance metrics for consolidated servers
In spite of the widespread adoption of virtualization and consol- idation, there exists no consensus with respect to how to bench- mark consolidated servers that run multiple guest VMs on the same physical hardware. For example, VMware proposes VMmark which basically computes the geometric mean of normalized throughput values across the VMs; Intel uses vConsolidate which reports a weighted arithmetic average of normalized throughput values.
These benchmarking methodologies focus on total system through- put (i.e., across all VMs in the system), and do not take into account per-VM performance. We argue that a benchmarking methodology for consolidated servers should quantify both total system through- put and per-VM performance in order to provide a meaningful and precise performance characterization. We therefore present two performance metrics, Total Normalized Throughput (TNT) to characterize total system performance, and Average Normalized Reduced Throughput (ANRT) to characterize per-VM performance.
We compare TNT and ANRT against VMmark using published performance numbers, and report several cases for which the VM- mark score is misleading. This is, VMmark says one platform yields better performance than another, however, TNT and ANRT show that both platforms represent different trade-offs in total system throughput versus per-VM performance. Or, even worse, in a cou- ple cases we observe that VMmark yields opposite conclusions than TNT and ANRT, i.e., VMmark says one system performs better than another one which is contradicted by TNT/ANRT performance characterization
True Performance Metrics in Electrochemical Energy Storage
A dramatic expansion of research in the area of electrochemical energy storage (EES) during the past decade has been driven by the demand for EES in handheld electronic devices, transportation, and storage of renewable energy for the power grid (1â3). However, the outstanding properties reported for new electrode materials may not necessarily be applicable to performance of electrochemical capacitors (ECs). These devices, also called supercapacitors or ultra-capacitors (4), store charge with ions from solution at charged porous electrodes. Unlike batteries, which store large amounts of energy but deliver it slowly, ECs can deliver energy faster (develop high power), but only for a short time. However, recent work has claimed energy densities for ECs approaching (5) or even exceeding that of batteries. We show that even when some metrics seem to support these claims, actual device performance may be rather mediocre. We will focus here on ECs, but these considerations also apply to lithium (Li)âion batteries
Exploring Symmetry of Binary Classification Performance Metrics
Selecting the proper performance metric constitutes a key issue for most classification problems in the field of machine learning. Although the specialized literature has addressed several topics regarding these metrics, their symmetries have yet to be systematically studied. This research focuses on ten metrics based on a binary confusion matrix and their symmetric behaviour is formally defined under all types of transformations. Through simulated experiments, which cover the full range of datasets and classification results, the symmetric behaviour of these metrics is explored by exposing them to hundreds of simple or combined symmetric transformations. Cross-symmetries among the metrics and statistical symmetries are also explored. The results obtained show that, in all cases, three and only three types of symmetries arise: labelling inversion (between positive and negative classes); scoring inversion (concerning good and bad classifiers); and the combination of these two inversions. Additionally, certain metrics have been shown to be independent of the imbalance in the dataset and two cross-symmetries have been identified. The results regarding their symmetries reveal a deeper insight into the behaviour of various performance metrics and offer an indicator to properly interpret their values and a guide for their selection for certain specific applications.University of Seville (Spain) by TelefĂłnica Chair âIntelligence in Networks
Recommended from our members
Public Performance Metrics: Driving Physician Motivation and Performance
Introduction: As providers transition from âfee-for-serviceâ to âpay-for-performanceâ models, focus has shifted to improving performance. This trend extends to the emergency department (ED) where visits continue to increase across the United States. Our objective was to determine whether displaying public performance metrics of physician triage data could drive intangible motivators and improve triage performance in the ED.Methods: This is a single institution, time-series performance study on a physician-in-triage system. Individual physician baseline metricsânumber of patients triaged and dispositioned per shiftâwere obtained and prominently displayed with identifiable labels during each quarterly physician group meeting. Physicians were informed that metrics would be collected and displayed quarterly and that there would be no bonuses, punishments, or required training; physicians were essentially free to do as they wished. It was made explicit that the goal was to increase the number triaged, and while the number dispositioned would also be displayed, it would not be a focus, thereby acting as this studyâs control. At the end of one year, we analyzed metrics.Results: The groupâs average number of patients triaged per shift were as follows: Q1-29.2; Q2-31.9; Q3-34.4; Q4-36.5 (Q1 vs Q4, p < 0.00001). The average numbers of patients dispositioned per shift were Q1-16.4; Q2-17.8; Q3-16.9; Q4-15.3 (Q1 vs Q4, p = 0.14). The top 25% of Q1 performers increased their average numbers triaged from Q1-36.5 to Q4-40.3 (ie, a statistically insignificant increase of 3.8 patients per shift [p = 0.07]). The bottom 25% of Q1 performers, on the other hand, increased their averages from Q1-22.4 to Q4-34.5 (ie, a statistically significant increase of 12.2 patients per shift [p = 0.0013]).Conclusion: Public performance metrics can drive intangible motivators (eg, purpose, mastery, and peer pressure), which can be an effective, low-cost strategy to improve individual performance, achieve institutional goals, and thrive in the pay-for-performance era
Surrogate regret bounds for generalized classification performance metrics
We consider optimization of generalized performance metrics for binary
classification by means of surrogate losses. We focus on a class of metrics,
which are linear-fractional functions of the false positive and false negative
rates (examples of which include -measure, Jaccard similarity
coefficient, AM measure, and many others). Our analysis concerns the following
two-step procedure. First, a real-valued function is learned by minimizing
a surrogate loss for binary classification on the training sample. It is
assumed that the surrogate loss is a strongly proper composite loss function
(examples of which include logistic loss, squared-error loss, exponential loss,
etc.). Then, given , a threshold is tuned on a separate
validation sample, by direct optimization of the target performance metric. We
show that the regret of the resulting classifier (obtained from thresholding
on ) measured with respect to the target metric is
upperbounded by the regret of measured with respect to the surrogate loss.
We also extend our results to cover multilabel classification and provide
regret bounds for micro- and macro-averaging measures. Our findings are further
analyzed in a computational study on both synthetic and real data sets.Comment: 22 page
Comparing performance metrics for multi-resource systems: the case of urban metabolism
We investigate different approaches to assessing the performance of multi-resource systems, i.e. networks of processes used to convert resource inputs to useful goods and services. For a given set of system outputs, alternative resource inputs are often possible so performance measures are needed to determine the best system configuration for a given goal. We define such performance measures according to a novel framework which categorises them into two types: those that can be calculated from a system's aggregate inputs and outputs (âblack-boxâ metrics, e.g. carbon footprint); and those that require knowledge of resource conversion processes within the system (âgrey-boxâ metrics). Urban areas are an important example application and metrics can be calculated from urban metabolism data. We calculate eight black-box metrics for fifteen global cities and find that performance is poorly correlated between the measures. This suggests that performance assessments should adopt grey-box approaches and consider flows at the level of individual processes within a city, using methods such as exergy analysis and ecological network analysis. We are led to suggest how to: (1) improve urban metabolism accounting to assist grey-box metric calculation, by including greater detail on conversion process and resource quality; and (2) promote these metrics amongst relevant decision makers
- âŠ