40 research outputs found

    Design Issues for Generalized Linear Models: A Review

    Full text link
    Generalized linear models (GLMs) have been used quite effectively in the modeling of a mean response under nonstandard conditions, where discrete as well as continuous data distributions can be accommodated. The choice of design for a GLM is a very important task in the development and building of an adequate model. However, one major problem that handicaps the construction of a GLM design is its dependence on the unknown parameters of the fitted model. Several approaches have been proposed in the past 25 years to solve this problem. These approaches, however, have provided only partial solutions that apply in only some special cases, and the problem, in general, remains largely unresolved. The purpose of this article is to focus attention on the aforementioned dependence problem. We provide a survey of various existing techniques dealing with the dependence problem. This survey includes discussions concerning locally optimal designs, sequential designs, Bayesian designs and the quantile dispersion graph approach for comparing designs for GLMs.Comment: Published at http://dx.doi.org/10.1214/088342306000000105 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian T-optimal discriminating designs

    Get PDF
    The problem of constructing Bayesian optimal discriminating designs for a class of regression models with respect to the T-optimality criterion introduced by Atkinson and Fedorov (1975a) is considered. It is demonstrated that the discretization of the integral with respect to the prior distribution leads to locally T-optimal discrimination designs can only deal with a few comparisons, but the discretization of the Bayesian prior easily yields to discrimination design problems for more than 100 competing models. A new efficient method is developed to deal with problems of this type. It combines some features of the classical exchange type algorithm with the gradient methods. Convergence is proved and it is demonstrated that the new method can find Bayesian optimal discriminating designs in situations where all currently available procedures fail.Comment: 25 pages, 3 figure

    Single and Multiresponse Adaptive Design of Experiments with Application to Design Optimization of Novel Heat Exchangers

    Get PDF
    Engineering design optimization often involves complex computer simulations. Optimization with such simulation models can be time consuming and sometimes computationally intractable. In order to reduce the computational burden, the use of approximation-assisted optimization is proposed in the literature. Approximation involves two phases, first is the Design of Experiments (DOE) phase, in which sample points in the input space are chosen. These sample points are then used in a second phase to develop a simplified model termed as a metamodel, which is computationally efficient and can reasonably represent the behavior of the simulation response. The DOE phase is very crucial to the success of approximation assisted optimization. This dissertation proposes a new adaptive method for single and multiresponse DOE for approximation along with an approximation-based framework for multilevel performance evaluation and design optimization of air-cooled heat exchangers. The dissertation is divided into three research thrusts. The first thrust presents a new adaptive DOE method for single response deterministic computer simulations, also called SFCVT. For SFCVT, the problem of adaptive DOE is posed as a bi-objective optimization problem. The two objectives in this problem, i.e., a cross validation error criterion and a space-filling criterion, are chosen based on the notion that the DOE method has to make a tradeoff between allocating new sample points in regions that are multi-modal and have sensitive response versus allocating sample points in regions that are sparsely sampled. In the second research thrust, a new approach for multiresponse adaptive DOE is developed (i.e., MSFCVT). Here the approach from the first thrust is extended with the notion that the tradeoff should also consider all responses. SFCVT is compared with three other methods from the literature (i.e., maximum entropy design, maximin scaled distance, and accumulative error). It was found that the SFCVT method leads to better performing metamodels for majority of the test problems. The MSFCVT method is also compared with two adaptive DOE methods from the literature and is shown to yield better metamodels, resulting in fewer function calls. In the third research thrust, an approximation-based framework is developed for the performance evaluation and design optimization of novel heat exchangers. There are two parts to this research thrust. First, is a new multi-level performance evaluation method for air-cooled heat exchangers in which conventional 3D Computational Fluid Dynamics (CFD) simulation is replaced with a 2D CFD simulation coupled with an e-NTU based heat exchanger model. In the second part, the methods developed in research thrusts 1 and 2 are used for design optimization of heat exchangers. The optimal solutions from the methods in this thrust have 44% less volume and utilize 61% less material when compared to the current state of the art microchannel heat exchangers. Compared to 3D CFD, the overall computational savings is greater than 95%

    Set-valued Data: Regression, Design and Outliers

    Get PDF
    The focus of this dissertation is to study set‐valued data from three aspects, namely regression, optimal design and outlier identification. This dissertation consists of three peer‐reviewed published articles, each of them addressing one aspect. Their titles and abstracts are listed below: 1. Local regression smoothers with set‐valued outcome data: This paper proposes a method to conduct local linear regression smoothing in the presence of set‐valued outcome data. The proposed estimator is shown to be consistent, and its mean squared error and asymptotic distribution are derived. A method to build error tubes around the estimator is provided, and a small Monte Carlo exercise is conducted to confirm the good finite sample properties of the estimator. The usefulness of the method is illustrated on a novel dataset from a clinical trial to assess the effect of certain genes’ expressions on different lung cancer treatments outcomes. 2. Optimal design for multivariate multiple linear regression with set‐identified response: We consider the partially identified regression model with set‐identified responses, where the estimator is the set of the least square estimators obtained for all possible choices of points sampled from set‐identified observations. We address the issue of determining the optimal design for this case and show that, for objective functions mimicking those for several classical optimal designs, their set‐identified analogues coincide with the optimal designs for point‐identified real‐valued responses. 3. Depth and outliers for samples of sets and random sets distributions: We suggest several constructions suitable to define the depth of set‐valued observations with respect to a sample of convex sets or with respect to the distribution of a random closed convex set. With the concept of a depth, it is possible to determine if a given convex set should be regarded an outlier with respect to a sample of convex closed sets. Some of our constructions are motivated by the known concepts of half‐space depth and band depth for function‐valued data. A novel construction derives the depth from a family of non‐linear expectations of random sets. Furthermore, we address the role of positions of sets for evaluation of their depth. Two case studies concern interval regression for Greek wine data and detection of outliers in a sample of particles

    Using hierarchical information-theoretic criteria to optimize subsampling of extensive datasets

    Get PDF
    This paper addresses the challenge of subsampling large datasets, aiming to generate a smaller dataset that retains a significant portion of the original information. To achieve this objective, we present a subsampling algorithm that integrates hierarchical data partitioning with a specialized tool tailored to identify the most informative observations within a dataset for a specified underlying linear model, not necessarily first-order, relating responses and inputs. The hierarchical data partitioning procedure systematically and incrementally aggregates information from smaller-sized samples into new samples. Simultaneously, our selection tool employs Semidefinite Programming for numerical optimization to maximize the information content of the chosen observations. We validate the effectiveness of our algorithm through extensive testing, using both benchmark and real-world datasets. The real-world dataset is related to the physicochemical characterization of white variants of Portuguese Vinho Verde. Our results are highly promising, demonstrating the algorithm's capability to efficiently identify and select the most informative observations while keeping computational requirements at a manageable level

    Optimal Study Designs for Cluster Randomised Trials: An Overview of Methods and Results

    Full text link
    There are multiple cluster randomised trial designs that vary in when the clusters cross between control and intervention states, when observations are made within clusters, and how many observations are made at that time point. Identifying the most efficient study design is complex though, owing to the correlation between observations within clusters and over time. In this article, we present a review of statistical and computational methods for identifying optimal cluster randomised trial designs. We also adapt methods from the experimental design literature for experimental designs with correlated observations to the cluster trial context. We identify three broad classes of methods: using exact formulae for the treatment effect estimator variance for specific models to derive algorithms or weights for cluster sequences; generalised methods for estimating weights for experimental units; and, combinatorial optimisation algorithms to select an optimal subset of experimental units. We also discuss methods for rounding weights to whole numbers of clusters and extensions to non-Gaussian models. We present results from multiple cluster trial examples that compare the different methods, including problems involving determining optimal allocation of clusters across a set of cluster sequences, and selecting the optimal number of single observations to make in each cluster-period for both Gaussian and non-Gaussian models, and including exchangeable and exponential decay covariance structures

    Design and Analysis of simulation experiments:Tutorial

    Get PDF

    Discrimination between Gaussian process models: active learning and static constructions

    Get PDF
    The paper covers the design and analysis of experiments to discriminate between two Gaussian process models with different covariance kernels, such as those widely used in computer experiments, kriging, sensor location and machine learning. Two frameworks are considered. First, we study sequential constructions, where successive design (observation) points are selected, either as additional points to an existing design or from the beginning of observation. The selection relies on the maximisation of the difference between the symmetric Kullback Leibler divergences for the two models, which depends on the observations, or on the mean squared error of both models, which does not. Then, we consider static criteria, such as the familiar log-likelihood ratios and the Fréchet distance between the covariance functions of the two models. Other distance-based criteria, simpler to compute than previous ones, are also introduced, for which, considering the framework of approximate design, a necessary condition for the optimality of a design measure is provided. The paper includes a study of the mathematical links between different criteria and numerical illustrations are provided

    A model-based framework assisting the design of vapor-liquid equilibrium experimental plans

    Get PDF
    In this paper we propose a framework for Model-based Sequential Optimal Design of Experiments to assist experimenters involved in Vapor-Liquid equilibrium characterization studies to systematically construct thermodynamically consistent models. The approach uses an initial continuous optimal design obtained via semidefinite programming, and then iterates between two stages (i) model fitting using the information available; and (ii) identification of the next experiment, so that the information content in data is maximized. The procedure stops when the number of experiments reaches the maximum for the experimental program or the dissimilarity between the parameter estimates during two consecutive iterations is below a given threshold. This methodology is exemplified with the D-optimal design of isobaric experiments, for characterizing binary mixtures using the NRTL and UNIQUAC thermodynamic models for liquid phase. Significant reductions of the confidence regions for the parameters are achieved compared with experimental plans where the observations are uniformly distributed over the domain
    corecore