13,408 research outputs found

    Evaluating epistemic uncertainty under incomplete assessments

    Get PDF
    The thesis of this study is to propose an extended methodology for laboratory based Information Retrieval evaluation under incomplete relevance assessments. This new methodology aims to identify potential uncertainty during system comparison that may result from incompleteness. The adoption of this methodology is advantageous, because the detection of epistemic uncertainty - the amount of knowledge (or ignorance) we have about the estimate of a system's performance - during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections. Across a series of experiments we demonstrate how this methodology can lead towards a finer grained analysis of systems. In particular, we show through experimentation how the current practice in Information Retrieval evaluation of using a measurement depth larger than the pooling depth increases uncertainty during system comparison

    A retrieval evaluation methodology for incomplete relevance assessments

    Get PDF
    In this paper we a propose an extended methodology for laboratory based Information Retrieval evaluation under in complete relevance assessments. This new protocol aims to identify potential uncertainty during system comparison that may result from incompleteness. We demonstrate how this methodology can lead towards a finer grained analysis of systems. This is advantageous, because the detection of uncertainty during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections

    A Practical Method to Estimate Information Content in the Context of 4D-Var Data Assimilation. I: Methodology

    Get PDF
    Data assimilation obtains improved estimates of the state of a physical system by combining imperfect model results with sparse and noisy observations of reality. Not all observations used in data assimilation are equally valuable. The ability to characterize the usefulness of different data points is important for analyzing the effectiveness of the assimilation system, for data pruning, and for the design of future sensor systems. This paper focuses on the four dimensional variational (4D-Var) data assimilation framework. Metrics from information theory are used to quantify the contribution of observations to decreasing the uncertainty with which the system state is known. We establish an interesting relationship between different information-theoretic metrics and the variational cost function/gradient under Gaussian linear assumptions. Based on this insight we derive an ensemble-based computational procedure to estimate the information content of various observations in the context of 4D-Var. The approach is illustrated on linear and nonlinear test problems. In the companion paper [Singh et al.(2011)] the methodology is applied to a global chemical data assimilation problem

    A Practical Method to Estimate Information Content in the Context of 4D-Var Data Assimilation. II: Application to Global Ozone Assimilation

    Get PDF
    Data assimilation obtains improved estimates of the state of a physical system by combining imperfect model results with sparse and noisy observations of reality. Not all observations used in data assimilation are equally valuable. The ability to characterize the usefulness of different data points is important for analyzing the effectiveness of the assimilation system, for data pruning, and for the design of future sensor systems. In the companion paper (Sandu et al., 2012) we derive an ensemble-based computational procedure to estimate the information content of various observations in the context of 4D-Var. Here we apply this methodology to quantify the signal and degrees of freedom for signal information metrics of satellite observations used in a global chemical data assimilation problem with the GEOS-Chem chemical transport model. The assimilation of a subset of data points characterized by the highest information content yields an analysis comparable in quality with the one obtained using the entire data set

    Multivariate adaptive regression splines for estimating riverine constituent concentrations

    Get PDF
    Regression-based methods are commonly used for riverine constituent concentration/flux estimation, which is essential for guiding water quality protection practices and environmental decision making. This paper developed a multivariate adaptive regression splines model for estimating riverine constituent concentrations (MARS-EC). The process, interpretability and flexibility of the MARS-EC modelling approach, was demonstrated for total nitrogen in the Patuxent River, a major river input to Chesapeake Bay. Model accuracy and uncertainty of the MARS-EC approach was further analysed using nitrate plus nitrite datasets from eight tributary rivers to Chesapeake Bay. Results showed that the MARS-EC approach integrated the advantages of both parametric and nonparametric regression methods, and model accuracy was demonstrated to be superior to the traditionally used ESTIMATOR model. MARS-EC is flexible and allows consideration of auxiliary variables; the variables and interactions can be selected automatically. MARS-EC does not constrain concentration-predictor curves to be constant but rather is able to identify shifts in these curves from mathematical expressions and visual graphics. The MARS-EC approach provides an effective and complementary tool along with existing approaches for estimating riverine constituent concentrations

    Content in the Context of 4D-Var Data Assimilation. II: Application to Global Ozone Assimilation

    Get PDF
    Data assimilation obtains improved estimates of the state of a physical system by combining imperfect model results with sparse and noisy observations of reality. Not all observations used in data assimilation are equally valuable. The ability to characterize the usefulness of different data points is important for analyzing the effectiveness of the assimilation system, for data pruning, and for the design of future sensor systems. In the companion paper [Sandu et al.(2011)] we derived an ensemble-based computational procedure to estimate the information content of various observations in the context of 4D-Var. Here we apply this methodology to quantify two information metrics (the signal and degrees of freedom for signal) for satellite observations used in a global chemical data assimilation problem with the GEOS-Chem chemical transport model. The assimilation of a subset of data points characterized by the highest information content, gives analyses that are comparable in quality with the one obtained using the entire data set
    corecore