40 research outputs found
Design Issues for Generalized Linear Models: A Review
Generalized linear models (GLMs) have been used quite effectively in the
modeling of a mean response under nonstandard conditions, where discrete as
well as continuous data distributions can be accommodated. The choice of design
for a GLM is a very important task in the development and building of an
adequate model. However, one major problem that handicaps the construction of a
GLM design is its dependence on the unknown parameters of the fitted model.
Several approaches have been proposed in the past 25 years to solve this
problem. These approaches, however, have provided only partial solutions that
apply in only some special cases, and the problem, in general, remains largely
unresolved. The purpose of this article is to focus attention on the
aforementioned dependence problem. We provide a survey of various existing
techniques dealing with the dependence problem. This survey includes
discussions concerning locally optimal designs, sequential designs, Bayesian
designs and the quantile dispersion graph approach for comparing designs for
GLMs.Comment: Published at http://dx.doi.org/10.1214/088342306000000105 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian T-optimal discriminating designs
The problem of constructing Bayesian optimal discriminating designs for a
class of regression models with respect to the T-optimality criterion
introduced by Atkinson and Fedorov (1975a) is considered. It is demonstrated
that the discretization of the integral with respect to the prior distribution
leads to locally T-optimal discrimination designs can only deal with a few
comparisons, but the discretization of the Bayesian prior easily yields to
discrimination design problems for more than 100 competing models. A new
efficient method is developed to deal with problems of this type. It combines
some features of the classical exchange type algorithm with the gradient
methods. Convergence is proved and it is demonstrated that the new method can
find Bayesian optimal discriminating designs in situations where all currently
available procedures fail.Comment: 25 pages, 3 figure
Single and Multiresponse Adaptive Design of Experiments with Application to Design Optimization of Novel Heat Exchangers
Engineering design optimization often involves complex computer simulations.
Optimization with such simulation models can be time consuming and sometimes
computationally intractable. In order to reduce the computational burden, the use of
approximation-assisted optimization is proposed in the literature. Approximation
involves two phases, first is the Design of Experiments (DOE) phase, in which
sample points in the input space are chosen. These sample points are then used in a
second phase to develop a simplified model termed as a metamodel, which is
computationally efficient and can reasonably represent the behavior of the simulation
response. The DOE phase is very crucial to the success of approximation assisted
optimization.
This dissertation proposes a new adaptive method for single and multiresponse
DOE for approximation along with an approximation-based framework for multilevel
performance evaluation and design optimization of air-cooled heat exchangers.
The dissertation is divided into three research thrusts. The first thrust presents a new
adaptive DOE method for single response deterministic computer simulations, also
called SFCVT. For SFCVT, the problem of adaptive DOE is posed as a bi-objective
optimization problem. The two objectives in this problem, i.e., a cross validation error
criterion and a space-filling criterion, are chosen based on the notion that the DOE
method has to make a tradeoff between allocating new sample points in regions that
are multi-modal and have sensitive response versus allocating sample points in
regions that are sparsely sampled. In the second research thrust, a new approach for
multiresponse adaptive DOE is developed (i.e., MSFCVT). Here the approach from
the first thrust is extended with the notion that the tradeoff should also consider all
responses. SFCVT is compared with three other methods from the literature (i.e.,
maximum entropy design, maximin scaled distance, and accumulative error). It was
found that the SFCVT method leads to better performing metamodels for majority of
the test problems. The MSFCVT method is also compared with two adaptive DOE
methods from the literature and is shown to yield better metamodels, resulting in
fewer function calls.
In the third research thrust, an approximation-based framework is developed for
the performance evaluation and design optimization of novel heat exchangers. There
are two parts to this research thrust. First, is a new multi-level performance evaluation
method for air-cooled heat exchangers in which conventional 3D Computational
Fluid Dynamics (CFD) simulation is replaced with a 2D CFD simulation coupled
with an e-NTU based heat exchanger model. In the second part, the methods
developed in research thrusts 1 and 2 are used for design optimization of heat
exchangers. The optimal solutions from the methods in this thrust have 44% less
volume and utilize 61% less material when compared to the current state of the art
microchannel heat exchangers. Compared to 3D CFD, the overall computational
savings is greater than 95%
Set-valued Data: Regression, Design and Outliers
The focus of this dissertation is to study setâvalued data from three aspects, namely regression, optimal design and outlier identification. This dissertation consists of three peerâreviewed published articles, each of them addressing one aspect. Their titles and abstracts are listed below:
1. Local regression smoothers with setâvalued outcome data:
This paper proposes a method to conduct local linear regression smoothing in the presence of setâvalued outcome data. The proposed estimator is shown to be consistent, and its mean squared error and asymptotic distribution are derived. A method to build error tubes around the estimator is provided, and a small Monte Carlo exercise is conducted to confirm the good finite sample properties of the estimator. The usefulness of the method is illustrated on a novel dataset from a clinical trial to assess the effect of certain genesâ expressions on different lung cancer treatments outcomes.
2. Optimal design for multivariate multiple linear regression with setâidentified response:
We consider the partially identified regression model with setâidentified responses, where the estimator is the set of the least square estimators obtained for all possible choices of points sampled from setâidentified observations. We address the issue of determining the optimal design for this case and show that, for objective functions mimicking those for several classical optimal designs, their setâidentified analogues coincide with the optimal designs for pointâidentified realâvalued responses.
3. Depth and outliers for samples of sets and random sets distributions:
We suggest several constructions suitable to define the depth of setâvalued observations with respect to a sample of convex sets or with respect to the distribution of a random closed convex set. With the concept of a depth, it is possible to determine if a given convex set should be regarded an outlier with respect to a sample of convex closed sets. Some of our constructions are motivated by the known concepts of halfâspace depth and band depth for functionâvalued data. A novel construction derives the depth from a family of nonâlinear expectations of random sets. Furthermore, we address the role of positions of sets for evaluation of their depth. Two case studies concern interval regression for Greek wine data and detection of outliers in a sample of particles
Using hierarchical information-theoretic criteria to optimize subsampling of extensive datasets
This paper addresses the challenge of subsampling large datasets, aiming to generate a smaller dataset that retains a significant portion of the original information. To achieve this objective, we present a subsampling algorithm that integrates hierarchical data partitioning with a specialized tool tailored to identify the most informative observations within a dataset for a specified underlying linear model, not necessarily first-order, relating responses and inputs. The hierarchical data partitioning procedure systematically and incrementally aggregates information from smaller-sized samples into new samples. Simultaneously, our selection tool employs Semidefinite Programming for numerical optimization to maximize the information content of the chosen observations. We validate the effectiveness of our algorithm through extensive testing, using both benchmark and real-world datasets. The real-world dataset is related to the physicochemical characterization of white variants of Portuguese Vinho Verde. Our results are highly promising, demonstrating the algorithm's capability to efficiently identify and select the most informative observations while keeping computational requirements at a manageable level
Optimal Study Designs for Cluster Randomised Trials: An Overview of Methods and Results
There are multiple cluster randomised trial designs that vary in when the
clusters cross between control and intervention states, when observations are
made within clusters, and how many observations are made at that time point.
Identifying the most efficient study design is complex though, owing to the
correlation between observations within clusters and over time. In this
article, we present a review of statistical and computational methods for
identifying optimal cluster randomised trial designs. We also adapt methods
from the experimental design literature for experimental designs with
correlated observations to the cluster trial context. We identify three broad
classes of methods: using exact formulae for the treatment effect estimator
variance for specific models to derive algorithms or weights for cluster
sequences; generalised methods for estimating weights for experimental units;
and, combinatorial optimisation algorithms to select an optimal subset of
experimental units. We also discuss methods for rounding weights to whole
numbers of clusters and extensions to non-Gaussian models. We present results
from multiple cluster trial examples that compare the different methods,
including problems involving determining optimal allocation of clusters across
a set of cluster sequences, and selecting the optimal number of single
observations to make in each cluster-period for both Gaussian and non-Gaussian
models, and including exchangeable and exponential decay covariance structures
Discrimination between Gaussian process models: active learning and static constructions
The paper covers the design and analysis of experiments to discriminate between two Gaussian process models with different covariance kernels, such as those widely used in computer experiments, kriging, sensor location and machine learning. Two frameworks are considered. First, we study sequential constructions, where successive design (observation) points are selected, either as additional points to an existing design or from the beginning of observation. The selection relies on the maximisation of the difference between the symmetric Kullback Leibler divergences for the two models, which depends on the observations, or on the mean squared error of both models, which does not. Then, we consider static criteria, such as the familiar log-likelihood ratios and the Fréchet distance between the covariance functions of the two models. Other distance-based criteria, simpler to compute than previous ones, are also introduced, for which, considering the framework of approximate design, a necessary condition for the optimality of a design measure is provided. The paper includes a study of the mathematical links between different criteria and numerical illustrations are provided
A model-based framework assisting the design of vapor-liquid equilibrium experimental plans
In this paper we propose a framework for Model-based Sequential Optimal Design of Experiments to assist experimenters involved in Vapor-Liquid equilibrium characterization studies to systematically construct thermodynamically consistent models. The approach uses an initial continuous optimal design obtained via semidefinite programming, and then iterates between two stages (i) model fitting using the information available; and (ii) identification of the next experiment, so that the information content in data is maximized. The procedure stops when the number of experiments reaches the maximum for the experimental program or the dissimilarity between the parameter estimates during two consecutive iterations is below a given threshold. This methodology is exemplified with the D-optimal design of isobaric experiments, for characterizing binary mixtures using the NRTL and UNIQUAC thermodynamic models for liquid phase. Significant reductions of the confidence regions for the parameters are achieved compared with experimental plans where the observations are uniformly distributed over the domain