29 research outputs found
Response Surface Methodology for Optimizing Hyper Parameters
The performance of an algorithm often largely depends on some hyper parameter which should be optimized before its usage. Since most conventional optimization methods suffer from some drawbacks, we developed an alternative way to find the best hyper parameter values. Contrary to the well known procedures, the new optimization algorithm is based on statistical methods since it uses a combination of Linear Mixed Effect Models and Response Surface Methodology techniques. In particular, the Method of Steepest Ascent which is well known for the case of an Ordinary Least Squares setting and a linear response surface has been generalized to be applicable for repeated measurements situations and for response surfaces of order o ?Ü 2. --repeated measurements,Random Intercepts Model,deterministic error terms,Method of Steepest Ascent,Support Vector Machine
A computer intensive method for choosing the ridge parameter
In this paper we describe a computer intensive method to find the ridge parameter in a prediction oriented linear model. With the help of a factorial experimental design the method is tested and compared to a classical one. --Ridge Regression,Experimental Design,Prediction model,Generalized Cross Validation
Latent Factor Prediction Pursuit for Rank Deficient Regressors
In simulation studies Latent Factor Prediction Pursuit outperformed classical reduced rank regression methods. The algorithm described so far for Latent Factor Prediction Pursuit had two shortcomings: It was only implemented for situations where the explanatory variables were of full colum rank. Also instead of the projection matrix only the regression matrix was calculated. These problems are addressed by a new algorithm which finds the prediction optimal projection. --simulated annealing,prediction oriented projections,reduced rank regression,rank deficient regressors,simulation study
Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment
Statistical methodology is proposed for comparing unlabeled marked point
sets, with an application to aligning steroid molecules in chemoinformatics.
Methods from statistical shape analysis are combined with techniques for
predicting random fields in spatial statistics in order to define a suitable
measure of similarity between two marked point sets. Bayesian modeling of the
predicted field overlap between pairs of point sets is proposed, and posterior
inference of the alignment is carried out using Markov chain Monte Carlo
simulation. By representing the fields in reproducing kernel Hilbert spaces,
the degree of overlap can be computed without expensive numerical integration.
Superimposing entire fields rather than the configuration matrices of point
coordinates thereby avoids the problem that there is usually no clear
one-to-one correspondence between the points. In addition, mask parameters are
introduced in the model, so that partial matching of the marked point sets can
be carried out. We also propose an adaptation of the generalized Procrustes
analysis algorithm for the simultaneous alignment of multiple point sets. The
methodology is illustrated with a simulation study and then applied to a data
set of 31 steroid molecules, where the relationship between shape and binding
activity to the corticosteroid binding globulin receptor is explored.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS486 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Statistical inference for molecular shapes
This thesis is concerned with developing statistical methods for evaluating and comparing molecular shapes. Techniques from statistical shape analysis serve as a basis for our methods. However, as molecules are fuzzy objects of electron clouds which constantly undergo vibrational motions and conformational changes, these techniques should be modified to be more suitable for the distinctive features of molecular shape.
The first part of this thesis is concerned with the continuous nature of molecules. Based on molecular properties which have been measured at the atom positions, a continuous field--based representation of a molecule is obtained using methods from spatial statistics. Within the framework of reproducing kernel Hilbert spaces, a similarity index for two molecular shapes is proposed which can then be used for the pairwise alignment of molecules. The alignment is carried out using Markov chain Monte Carlo methods and posterior inference. In the Bayesian setting, it is also possible to introduce additional parameters (mask vectors) which allow for the fact that only part of the molecules may be similar. We apply our methods to a dataset of 31 steroid molecules which fall into three activity classes with respect to the binding activity to a common receptor protein. To investigate which molecular features distinguish the activity classes, we also propose a generalisation of the pairwise method to the simultaneous alignment of several molecules.
The second part of this thesis is concerned with the dynamic aspect of molecular shapes. Here, we consider a dataset containing time series of DNA configurations which have been obtained using molecular dynamic simulations. For each considered DNA duplex, both a damaged and an undamaged version are available, and the objective is to investigate whether or not the damage induces a significant difference to the the mean shape of the molecule. To do so, we consider bootstrap hypothesis tests for the equality of mean shapes. In particular, we investigate the use of a computationally inexpensive algorithm which is based on the Procrustes tangent space. Two versions of this algorithm are proposed. The first version is designed for independent configuration matrices while the second version is specifically designed to accommodate temporal dependence of the configurations within each group and is hence more suitable for the DNA data
Localized Linear Discriminant Analysis
Despite its age, the Linear Discriminant Analysis performs well even in situations where the underlying premises like normally distributed data with constant covariance matrices over all classes are not met. It is, however, a global technique that does not regard the nature of an individual observation to be classified. By weighting each training observation according to its distance to the observation of interest, a global classifier can be transformed into an observation specific approach. So far, this has been done for logistic discrimination. By using LDA instead, the computation of the local classifier is much simpler. Moreover, it is ready for applications in multi-class situations. --classification,local models,LDA
Statistical inference for molecular shapes
This thesis is concerned with developing statistical methods for evaluating and comparing molecular shapes. Techniques from statistical shape analysis serve as a basis for our methods. However, as molecules are fuzzy objects of electron clouds which constantly undergo vibrational motions and conformational changes, these techniques should be modified to be more suitable for the distinctive features of molecular shape.
The first part of this thesis is concerned with the continuous nature of molecules. Based on molecular properties which have been measured at the atom positions, a continuous field--based representation of a molecule is obtained using methods from spatial statistics. Within the framework of reproducing kernel Hilbert spaces, a similarity index for two molecular shapes is proposed which can then be used for the pairwise alignment of molecules. The alignment is carried out using Markov chain Monte Carlo methods and posterior inference. In the Bayesian setting, it is also possible to introduce additional parameters (mask vectors) which allow for the fact that only part of the molecules may be similar. We apply our methods to a dataset of 31 steroid molecules which fall into three activity classes with respect to the binding activity to a common receptor protein. To investigate which molecular features distinguish the activity classes, we also propose a generalisation of the pairwise method to the simultaneous alignment of several molecules.
The second part of this thesis is concerned with the dynamic aspect of molecular shapes. Here, we consider a dataset containing time series of DNA configurations which have been obtained using molecular dynamic simulations. For each considered DNA duplex, both a damaged and an undamaged version are available, and the objective is to investigate whether or not the damage induces a significant difference to the the mean shape of the molecule. To do so, we consider bootstrap hypothesis tests for the equality of mean shapes. In particular, we investigate the use of a computationally inexpensive algorithm which is based on the Procrustes tangent space. Two versions of this algorithm are proposed. The first version is designed for independent configuration matrices while the second version is specifically designed to accommodate temporal dependence of the configurations within each group and is hence more suitable for the DNA data
Bayesian alignment of continuous molecular shapes using random fields
Statistical methodology is proposed for comparing
molecular shapes. In order to account for the continuous nature of molecules, classical shape analysis methods are combined with techniques used for predicting random fields in spatial statistics. Applying a modification of Procrustes analysis, Bayesian inference is carried out using Markov chain Monte Carlo methods for the pairwise alignment of the resulting molecular fields. Superimposing entire fields rather than the configuration matrices of nuclear positions thereby solves the problem that there is usually no clear one--to--one correspondence between the atoms of the two molecules under consideration. Using a similar concept, we also propose an adaptation of the generalised Procrustes analysis algorithm for the simultaneous alignment of multiple molecular fields. The methodology is applied to a dataset of 31 steroid molecules
Bayesian alignment of continuous molecular shapes using random fields
Statistical methodology is proposed for comparingmolecular shapes. In order to account for the continuous nature of molecules, classical shape analysis methods are combined with techniques used for predicting random fields in spatial statistics. Applying a modification of Procrustes analysis, Bayesian inference is carried out using Markov chain Monte Carlo methods for the pairwise alignment of the resulting molecular fields. Superimposing entire fields rather than the configuration matrices of nuclear positions thereby solves the problem that there is usually no clear one--to--one correspondence between the atoms of the two molecules under consideration. Using a similar concept, we also propose an adaptation of the generalised Procrustes analysis algorithm for the simultaneous alignment of multiple molecular fields. The methodology is applied to a dataset of 31 steroid molecules