10 research outputs found

    Optimal design to discriminate between rival copula models for a bivariate binary response

    Get PDF
    We consider a bivariate logistic model for a binary response, and we assume that two rival dependence structures are possible. Copula functions are very useful tools to model different kinds of dependence with arbitrary marginal distributions. We consider Clayton and Gumbel copulae as competing association models. The focus is on applications in testing a new drug looking at both efficacy and toxicity outcomes. In this context, one of the main goals is to find the dose which maximizes the probability of efficacy without toxicity, herein called P-optimal dose. If the P-optimal dose changes under the two rival copulae, then it is relevant to identify the proper association model. To this aim, we propose a criterion (called PKL) which enables us to find the optimal doses to discriminate between the rival copulae, subject to a constraint that protects patients against dangerous doses. Furthermore, by applying the likelihood ratio test for non-nested models, via a simulation study we confirm that the PKL-optimal design is really able to discriminate between the rival copulae

    Optimal subsample selection in big datasets

    No full text

    Optimal Design of Experiments and Model-Based Survey Sampling in Big Data

    No full text
    Big Data are generally huge quantities of digital information accrued automatically and/or merged from several sources and rarely result from properly planned population surveys. A Big Dataset is herein conceived as a collection of information concerning a finite population. Since the analysis of an entire Big Dataset can require enormous computational effort, we suggest selecting a sample of observations and using this sampling information to achieve the inferential goal. Instead of the design-based survey sampling approach (which relates to the estimation of summary finite population measures, such as means, totals, proportions) we consider the model-based sampling approach, which involves inference about parameters of a super-population model. This model is assumed to have generated the finite population values, i.e. the Big Dataset. Given a super-population model we can apply the theory of optimal design to draw a sample from the Big Dataset which contains the majority of information about the unknown parameters of interest. In addition, since a Big Dataset might provide poor information despite its size, from the definition of efficiency of a design we suggest a device to measure the quality of the Big Data

    Optimal Design of Experiments and Model-based survey sampling in Big-Data

    No full text
    Big Data are generally huge quantities of digital information accrued automatically and/or merged from several sources and rarely result from properly planned population surveys. A Big Dataset is herein conceived as a collection of information concerning a nite population. Since the anal- ysis of an entire Big Dataset can require enormous computational eort, we suggest selecting a sample of observations and using this sampling information to achieve the inferential goal. Instead of the design-based survey sampling approach (which relates to the estimation of summary nite population measures, such as means, totals, proportions) we con- sider the model-based sampling approach, which involves inference about parameters of a super-population model. This model is assumed to have generated the nite population values, i.e. the Big Dataset. Given a super-population model we can apply the theory of optimal design to draw a sample from the Big Dataset which contains the majority of in- formation about the unknown parameters of interest. In addition, since a Big Dataset might provide poor information despite its size, from the def- inition of eciency of a design we suggest a device to measure the quality of the Big Data

    Optimal subset selection without outliers

    No full text
    With the advent of ‘Big Data’, massive data sets are becoming increasingly prevalent. Several subdata selection are proposed in these last few years both to reduce the computational burden and to improve cost effectiveness and learning of the phenomenon. Some of these proposals (Drovandi et al., 2017; Wang et al., 2019; Deldossi and Tommasi (2021) among others) are inspired to Optimal Experimental Design (OED). However, differently from the OED context - where researchers have typically complete control over the predictors - in subsampling methods these, and the responses as well, are passively observed. Thus if outliers are present in the ‘Big Data’, it is likely that they could be included in the sample selected applying the D-criterion, being the D-optimal design points on the boundary of the design space. In regression analysis, outliers - and more in general influential points – could have a large impact on the estimates; identify and exclude them in advance, especially in large datasets, is generally not an easy task. In this study, we propose an exchange procedure to select a compromise-optimal subset which is informative for the inferential goal and avoids outliers and ‘bad’ influential point

    Bayesian sample size determination for Multisite Replication Studies

    No full text
    To overcome the frequently debated “reproducibility crisis” in science, replicating studies is becoming increasingly common across a variety of disciplines such as psychology, economics and medicine. Their aim is to assess whether the original study is statistically consistent with the replications, and to assess the evidence for the presence of an effect of interest. While the majority of the analyses is based on a single replication, multiple replications of the same experiment, usually conducted at different sites, are becoming more frequent. In this framework, our interest concerns the variation of results between sites and, more specifically, the issue of how to design the replication studies (i.e. how many sites and how many subjects within site) in order to yield sufficiently sensitive conclusions. For instance, if interest centers on hypothesis-testing, this means that tests should be well-powered, as described in Hedges and Schauer (2021) from a frequentist perspective. In this work, we propose a Bayesian scheme for designing multisite replication studies in view of testing heterogeneity between sites. We adopt a normal-normal hierarchical model and use the Bayes factor as a measure of evidence
    corecore