37 research outputs found
Optimal designs for inverse prediction in univariate nonlinear calibration models
Univariate calibration models are intended to link a quantity of interest X (e.g. the concentration of a chemical compound) to a value Y obtained from a measurement device. In this context, a major concern is to build calibration models that are able to provide precise (inverse) predictions for X from measured responses Y. This paper aims at answering the following question: which experiments should be run to set up a (linear or nonlinear) calibration curve that maximises the inverse prediction precisions? The well known class of optimal designs is presented as a possible solution. The calibration model setup is first reviewed in the linear case and extended to the heteroscedastic nonlinear one. In this general case, asymptotic variance and confidence interval formulae for inverse predictions are derived. Two optimality criteria are then introduced to quantify a priori the quality of inverse predictions provided by a given experimental design. The VI criterion is based on the integral of the inverse prediction variance over the calibration domain and the GI criterion on its maximum value. Algorithmic aspects of the optimal design generation are discussed. In a last section, the methodology is applied to four possible calibration models (linear, quadratic, exponential and four parameter logistic). VI and GI optimal designs are compared to classical D, V and G optimal ones. Their predictive quality is also compared to the one of simple traditional equidistant designs and it is shown that, even if these last designs have very different shapes, their predictive quality are not far from the optimal ones. Finally, some simulations evaluate small sample properties of asymptotic inverse prediction confidence intervals
Optimal designs for inverse prediction in nonlinear calibration models
Calibration models are intended to link a quantity of interest X (e.g. the concentration of a chemical compound) to a value Y obtained from a measurement device. In this context, a major concern is to build calibration models that are able to provide precise (inverse) predictions for X from measured responses Y. This paper aims at answering the following question : which experiments should be run to set up a (linear or nonlinear) calibration curve that maximises the inverse prediction precisions ? The well known class of optimal designs is presented as a possible solution. The calibration model setup is first reviewed in the linear case and extended to the heteroscedastic nonlinear one. In this general case, asymptotic variance and confidence interval formulae are derived for inverse predictions. Two optimality criteria are then introduced to quantify a priori the quality of inverse predictions for a given experimental design. The VI criterion is based on the integral of the inverse prediction variance over the calibration domain and the GI criterion on its maximum value. Algorithmic aspects of the optimal design generation are discussed. In a last section, the methodology is applied to 4 possible calibration models (linear, quadratic, exponential and four parameters logistic). VI and GI optimal designs are compared to classical D, V and G optimal designs. Their predictive quality is also compared to the one of simple traditional equidistant designs and it is shown that, even if these last designs have very different shapes, their predictive quality are not far from the optimal design ones. Finally, some simulations evaluate small sample properties of asymptotic inverse prediction confidence intervals
Comparison of some chemometric tools for metabonomics biomarker identification
NMR-based metabonomics discovery approaches require statistical methods to extract, from large and complex spectral databases, biomarkers or biologically significant variables that best represent defined biological conditions. This paper explores the respective effectiveness of six multivariate methods: multiple hypotheses testing, supervised extensions of principal (PCA) and independent components analysis (ICA), discriminant partial least squares, linear logistic regression and classification trees. Each method has been adapted in order to provide a biomarker score for each zone of the spectrum. These scores aim at giving to the biologist indications on which metabolites of the analyzed biofluid are potentially affected by a stressor factor of interest (e.g. toxicity of a drug, presence of a given disease or therapeutic effect of a drug).The applications of the six methods to samples of 60 and 200 spectra issued from a semi-artificial database allowed to evaluate their respective properties. In particular their sensitivities and false discovery rates (FDR) are illustrated through receiver operating characteristics curves (ROC) and the resulting identifications are used to show their specificities and relative advantages. The paper recommends to discard two methods for biomarkers identification: the PCA showing a general low efficiency and the CART which is very sensitive to noise. The other 4 methods give promising results, each having its own specificities biomarker
The Expected Design Space for analytical methods: a new perspective based on modeling and prediction
The Design Space (DS) of an analytical method is defined as the set of factor settings that provides satisfactory results, with respect to pre-defined constraints. The proposed methodology aims at identifying a region in the space of factors that will likely provide satisfactory results during the future use of the analytical method in routine, through an optimization process of this method.
First, the DS is statistically defined as derived from the β-Expectation prediction interval. Second, multi-criteria perspective is added in this definition as it is often required for optimizing analytical method. Finally, a Monte-Carlo simulation is envisaged to numerically predict and identify the DS under uncertainty.
Examples based on high-performance liquid chromatography (HPLC) methods will be given, illustrating the applicability of the methodology
Utilisation de l'analyse en composantes indépendantes et méthodes de clustering pour trouver et identifier des composés d'intérêt dans une matrice de données spectrales (UV-DAD)
In the framework of the elaboration of new pharmaceutical formulations, statistical methodologies can accelerate and automate the development and optimization of quantitative methods.
To fulfil this objective, design of experiment (DOE) are widely used. In this context, the same mixture of analytes is injected while LC operating conditions are assessed. This gives plenty of very different chromatograms that can be tedious and time-consuming to interpret.
Recently, the independent component analysis (ICA) has shown its usefulness to interpret chromatogram, i.e. to separate numerical signals from a matrix containing data provided by liquid chromatography system equipped with ultra violet diode array detector (LC-UV DAD). A matrix containing peaks corresponding to different analytes is then obtained. A brief summary of the ICA algorithm, applied to this problem, will be first given.
The aim of the current work is to show that an automated methodology can be used to match together ICA-identified peaks that correspond to the same analytes in different chromatograms. In this way, this task, attributed to analytical experts, can be quickened and easier. This methodology uses classical hierarchical agglomerative clustering with special dissimilarity measures between spectra
Risk management for analytical methods based on the total error concept: conciliating the objectives of the pre-study and in-study validation phases
In industries that involve either chemistry or biology, analytical methods are necessary to keep an eye on all the material produced. If the quality of an analytical method is doubtful, then the whole set of decisions based on those measures is questionable. For this reason, being able to assess the quality of an analytical method is far more than a statistical challenge; it is a matter of ethics and good business practices. The validity of an analytical method must be assessed at two levels. The “pre-study” validation aims to show, by an appropriate set of designed experiments, that the method is able to achieve its objectives. The “in-study” validation is intended to verify, by inserting QC samples in routine runs, that the method remains valid over time. At these two levels, the total error approach considers a method as valid if a sufficient proportion of analytical results are expected to lie in a given interval around the (unknown) nominal value. This paper discusses two methods, based on this total error concept, of checking the validity of a measurement method at the pre-study level. The first checks whether a tolerance interval for hypothetical future measurements lies within given acceptance limits; the second calculates the probability of a result lying within these limits and computes whether it is greater than a given acceptance level. For the “in-study” validation, the paper assesses the properties of the s–n–λ rule recommended by the FDA. The properties and respective advantages and limitations of these methods are investigated. A crucial point is to ensure that the decisions taken at the pre-study stage and in routine use are coherent.More precisely, a laboratory should not see its method rejected in routine use when it has been proved to be valid and remains so. This paper shows how this goal may be achieved by choosing compatible validation parameters at both pre- and in-study levels
A fast exchange algorithm for designing focused libraries in lead optimisation
Combinatorial chemistry is widely used in drug discovery. Once a lead compound has been identified, a series of R-groups and reagents can be selected and combined to generate new potential drugs. The combinatorial nature of this problem leads to chemical libraries containing usually a very large number of virtual compounds, far too large to permit their chemical synthesis. Therefore, one often wants to select a subset of ”good” reagents for each R-group of reagents and synthesise all their possible combinations. In this research, one encounters some difficuities. First, the selection of reagents has to be done such that the compounds of the resulting sub-library simultaneously optimise a series of chemical properties. For each compound, we use a desirability index, a concept proposed by Harrington [20], to summarise those properties in one fitness vaiue. Then a loss function is used as objective criteria to globally quantify the quality of a sub-library. Secondly, there are a huge number of possible sub-libraries and the solutions space has to be explored as fast as possible. The WEALD algorithm proposed in this paper starts with a random solution and iterates by applying exchanges, a simple method proposed by Federov [13] and often used in the generation of optimal designs. Those exchanges, are guided by a weighting of the reagents adapted recursively as the solutions space is explored. The algorithm is applied on a real database and reveals to converge rapidly. It is compared to results given by two other algorithms presented in the combinatorial chemistry literature: the Piccolo algorithm of W. Zheng et al.[37] and the Ultmfast algorithm of D. Agrafiotis and V. Lobanov [4]