62,543 research outputs found
Model evaluation of target product profiles of an infant vaccine against respiratory syncytial virus (RSV) in a developed country setting
Respiratory syncytial virus (RSV) is a major cause of lower respiratory tract disease in children worldwide and is a significant cause of hospital admissions in young children in England. No RSV vaccine has been licensed but a number are under development. In this work, we present two structurally distinct mathematical models, parameterized using RSV data from the UK, which have been used to explore the effect of introducing an RSV paediatric vaccine to the National programme. We have explored different vaccine properties, and dosing regimens combined with a range of implementation strategies for RSV control. The results suggest that vaccine properties that confer indirect protection have the greatest effect in reducing the burden of disease in children under 5 years. The findings are reinforced by the concurrence of predictions from the two models with very different epidemiological structure. The approach described has general application in evaluating vaccine target product profiles
Dropout Model Evaluation in MOOCs
The field of learning analytics needs to adopt a more rigorous approach for
predictive model evaluation that matches the complex practice of
model-building. In this work, we present a procedure to statistically test
hypotheses about model performance which goes beyond the state-of-the-practice
in the community to analyze both algorithms and feature extraction methods from
raw data. We apply this method to a series of algorithms and feature sets
derived from a large sample of Massive Open Online Courses (MOOCs). While a
complete comparison of all potential modeling approaches is beyond the scope of
this paper, we show that this approach reveals a large gap in dropout
prediction performance between forum-, assignment-, and clickstream-based
feature extraction methods, where the latter is significantly better than the
former two, which are in turn indistinguishable from one another. This work has
methodological implications for evaluating predictive or AI-based models of
student success, and practical implications for the design and targeting of
at-risk student models and interventions
VPPA weld model evaluation
NASA uses the Variable Polarity Plasma Arc Welding (VPPAW) process extensively for fabrication of Space Shuttle External Tanks. This welding process has been in use at NASA since the late 1970's but the physics of the process have never been satisfactorily modeled and understood. In an attempt to advance the level of understanding of VPPAW, Dr. Arthur C. Nunes, Jr., (NASA) has developed a mathematical model of the process. The work described in this report evaluated and used two versions (level-0 and level-1) of Dr. Nunes' model, and a model derived by the University of Alabama at Huntsville (UAH) from Dr. Nunes' level-1 model. Two series of VPPAW experiments were done, using over 400 different combinations of welding parameters. Observations were made of VPPAW process behavior as a function of specific welding parameter changes. Data from these weld experiments was used to evaluate and suggest improvements to Dr. Nunes' model. Experimental data and correlations with the model were used to develop a multi-variable control algorithm for use with a future VPPAW controller. This algorithm is designed to control weld widths (both on the crown and root of the weld) based upon the weld parameters, base metal properties, and real-time observation of the crown width. The algorithm exhibited accuracy comparable to that of the weld width measurements for both aluminum and mild steel welds
Causal networks for climate model evaluation and constrained projections
Global climate models are central tools for understanding past and future climate change. The assessment of model skill, in turn, can benefit from modern data science approaches. Here we apply causal discovery algorithms to sea level pressure data from a large set of climate model simulations and, as a proxy for observations, meteorological reanalyses. We demonstrate how the resulting causal networks (fingerprints) offer an objective pathway for process-oriented model evaluation. Models with fingerprints closer to observations better reproduce important precipitation patterns over highly populated areas such as the Indian subcontinent, Africa, East Asia, Europe and North America. We further identify expected model interdependencies due to shared development backgrounds. Finally, our network metrics provide stronger relationships for constraining precipitation projections under climate change as compared to traditional evaluation metrics for storm tracks or precipitation itself. Such emergent relationships highlight the potential of causal networks to constrain longstanding uncertainties in climate change projections. Algorithms to assess causal relationships in data sets have seen increasing applications in climate science in recent years. Here, the authors show that these techniques can help to systematically evaluate the performance of climate models and, as a result, to constrain uncertainties in future climate change projections
Revisiting Precision and Recall Definition for Generative Model Evaluation
In this article we revisit the definition of Precision-Recall (PR) curves for
generative models proposed by Sajjadi et al. (arXiv:1806.00035). Rather than
providing a scalar for generative quality, PR curves distinguish mode-collapse
(poor recall) and bad quality (poor precision). We first generalize their
formulation to arbitrary measures, hence removing any restriction to finite
support. We also expose a bridge between PR curves and type I and type II error
rates of likelihood ratio classifiers on the task of discriminating between
samples of the two distributions. Building upon this new perspective, we
propose a novel algorithm to approximate precision-recall curves, that shares
some interesting methodological properties with the hypothesis testing
technique from Lopez-Paz et al (arXiv:1610.06545). We demonstrate the interest
of the proposed formulation over the original approach on controlled
multi-modal datasets.Comment: ICML 201
User's appraisal of yield model evaluation criteria
The five major potential USDA users of AgRISTAR crop yield forecast models rated the Yield Model Development (YMD) project Test and Evaluation Criteria by the importance placed on them. These users were agreed that the "TIMELINES" and "RELIABILITY" of the forecast yields would be of major importance in determining if a proposed yield model was worthy of adoption. Although there was considerable difference of opinion as to the relative importance of the other criteria, "COST", "OBJECTIVITY", "ADEQUACY", AND "MEASURES OF ACCURACY" generally were felt to be more important that "SIMPLICITY" and "CONSISTENCY WITH SCIENTIFIC KNOWLEDGE". However, some of the comments which accompanied the ratings did indicate that several of the definitions and descriptions of the criteria were confusing
Online Model Evaluation in a Large-Scale Computational Advertising Platform
Online media provides opportunities for marketers through which they can
deliver effective brand messages to a wide range of audiences. Advertising
technology platforms enable advertisers to reach their target audience by
delivering ad impressions to online users in real time. In order to identify
the best marketing message for a user and to purchase impressions at the right
price, we rely heavily on bid prediction and optimization models. Even though
the bid prediction models are well studied in the literature, the equally
important subject of model evaluation is usually overlooked. Effective and
reliable evaluation of an online bidding model is crucial for making faster
model improvements as well as for utilizing the marketing budgets more
efficiently. In this paper, we present an experimentation framework for bid
prediction models where our focus is on the practical aspects of model
evaluation. Specifically, we outline the unique challenges we encounter in our
platform due to a variety of factors such as heterogeneous goal definitions,
varying budget requirements across different campaigns, high seasonality and
the auction-based environment for inventory purchasing. Then, we introduce
return on investment (ROI) as a unified model performance (i.e., success)
metric and explain its merits over more traditional metrics such as
click-through rate (CTR) or conversion rate (CVR). Most importantly, we discuss
commonly used evaluation and metric summarization approaches in detail and
propose a more accurate method for online evaluation of new experimental models
against the baseline. Our meta-analysis-based approach addresses various
shortcomings of other methods and yields statistically robust conclusions that
allow us to conclude experiments more quickly in a reliable manner. We
demonstrate the effectiveness of our evaluation strategy on real campaign data
through some experiments.Comment: Accepted to ICDM201
- …