11 research outputs found
Modeling regionalized volumetric differences in protein-ligand binding cavities
Identifying elements of protein structures that create differences in protein-ligand
binding specificity is an essential method for explaining the molecular mechanisms
underlying preferential binding. In some cases, influential mechanisms can be
visually identified by experts in structural biology, but subtler mechanisms, whose
significance may only be apparent from the analysis of many structures, are harder to
find. To assist this process, we present a geometric algorithm and two statistical
models for identifying significant structural differences in protein-ligand binding
cavities. We demonstrate these methods in an analysis of sequentially nonredundant
structural representatives of the canonical serine proteases and the enolase
superfamily. Here, we observed that statistically significant structural variations
identified experimentally established determinants of specificity. We also observed
that an analysis of individual regions inside cavities can reveal areas where small
differences in shape can correspond to differences in specificity
On Parametric and Nonparametric Methods for Dependent Data
In recent years, there has been a surge of research interest in the analysis of time series
and spatial data. While on one hand more and more sophisticated models are being
developed, on the other hand the resulting theory and estimation process has become
more and more involved. This dissertation addresses the development of statistical
inference procedures for data exhibiting dependencies of varied form and structure.
In the first work, we consider estimation of the mean squared prediction error
(MSPE) of the best linear predictor of (possibly) nonlinear functions of finitely many
future observations in a stationary time series. We develop a resampling methodology
for estimating the MSPE when the unknown parameters in the best linear predictor
are estimated. Further, we propose a bias corrected MSPE estimator based on the
bootstrap and establish its second order accuracy. Finite sample properties of the
method are investigated through a simulation study.
The next work considers nonparametric inference on spatial data. In this work
the asymptotic distribution of the Discrete Fourier Transformation (DFT) of spatial
data under pure and mixed increasing domain spatial asymptotic structures are
studied under both deterministic and stochastic spatial sampling designs. The deterministic
design is specified by a scaled version of the integer lattice in IRd while
the data-sites under the stochastic spatial design are generated by a sequence of independent
random vectors, with a possibly nonuniform density. A detailed account
of the asymptotic joint distribution of the DFTs of the spatial data is given which, among other things, highlights the effects of the geometry of the sampling region and
the spatial sampling density on the limit distribution. Further, it is shown that in
both deterministic and stochastic design cases, for "asymptotically distant" frequencies,
the DFTs are asymptotically independent, but this property may be destroyed if
the frequencies are "asymptotically close". Some important implications of the main
results are also given
Regridding Uncertainty for Statistical Downscaling of Solar Radiation
Initial steps in statistical downscaling involve being able to compare
observed data from regional climate models (RCMs). This prediction requires (1)
regridding RCM output from their native grids and at differing spatial
resolutions to a common grid in order to be comparable to observed data and (2)
bias correcting RCM data, via quantile mapping, for example, for future
modeling and analysis. The uncertainty associated with (1) is not always
considered for downstream operations in (2). This work examines this
uncertainty, which is not often made available to the user of a regridded data
product. This analysis is applied to RCM solar radiation data from the
NA-CORDEX data archive and observed data from the National Solar Radiation
Database housed at the National Renewable Energy Lab. A case study of the
mentioned methods over California is presented.Comment: 16 pages, 5 figures, submitted to: Advances in Statistical
Climatology, Meteorology and Oceanograph
New and Fast Block Bootstrap-Based Prediction Intervals for GARCH (1,1) Process with Application to Exchange Rates
In this paper, we propose a new bootstrap algorithm to obtain prediction intervals for generalized autoregressive conditionally heteroscedastic (GARCH(1,1)) process which can be applied to construct prediction intervals for future returns and volatilities. The advantages of the proposed method are twofold: it (a) often exhibits improved performance and (b) is computationally more efficient compared to other available resampling methods. The superiority of this method over the other resampling method-based prediction intervals is explained with Spearman's rank correlation coefficient. The finite sample properties of the proposed method are also illustrated by an extensive simulation study and a real-world example