2,027 research outputs found
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
MAP entropy estimation: Applications in robust image filtering
We introduce a new approach for image filtering in a Bayesian framework. In this case the probability density function (pdf) of the likelihood function is approximated using the concept of non-parametric or kernel estimation. The method is based on the generalized
Gaussian Markov random fields (GGMRF), a class of Markov random fields which are used as prior information into the Bayesian rule, which principal objective is to eliminate those effects caused by the excessive smoothness on the reconstruction process of images which are rich in contours or edges. Accordingly to the hypothesis made for the present work, it is assumed a limited knowledge of the noise pdf, so the idea is to use a non-parametric estimator to estimate such a pdf and then apply the entropy to construct the cost function for the likelihood term. The previous idea leads to the construction of Maximum a posteriori (MAP) robust estimators, since the real systems are always exposed to continuous perturbations of unknown nature. Some promising results of three new MAP entropy estimators (MAPEE) for image filtering are presented, together with some concluding remarks
Density estimation and modeling on symmetric spaces
In many applications, data and/or parameters are supported on non-Euclidean
manifolds. It is important to take into account the geometric structure of
manifolds in statistical analysis to avoid misleading results. Although there
has been a considerable focus on simple and specific manifolds, there is a lack
of general and easy-to-implement statistical methods for density estimation and
modeling on manifolds. In this article, we consider a very broad class of
manifolds: non-compact Riemannian symmetric spaces. For this class, we provide
a very general mathematical result for easily calculating volume changes of the
exponential and logarithm map between the tangent space and the manifold. This
allows one to define statistical models on the tangent space, push these models
forward onto the manifold, and easily calculate induced distributions by
Jacobians. To illustrate the statistical utility of this theoretical result, we
provide a general method to construct distributions on symmetric spaces. In
particular, we define the log-Gaussian distribution as an analogue of the
multivariate Gaussian distribution in Euclidean space. With these new kernels
on symmetric spaces, we also consider the problem of density estimation. Our
proposed approach can use any existing density estimation approach designed for
Euclidean spaces and push it forward to the manifold with an easy-to-calculate
adjustment. We provide theorems showing that the induced density estimators on
the manifold inherit the statistical optimality properties of the parent
Euclidean density estimator; this holds for both frequentist and Bayesian
nonparametric methods. We illustrate the theory and practical utility of the
proposed approach on the space of positive definite matrices
Recommended from our members
Towards more robust and efficient methods for the calculation of Protein-Ligand binding affinities
Biological processes often depend on protein-ligand binding events, so that accurate prediction of protein-ligand binding affinities is of central importance in structural based drug design. Although many techniques exist for calculating protein-ligand binding affinities, ranging from techniques that should be accurate in principle, such as free energy perturbation (FEP) theory, to relatively simple approximations based on empirically derived scoring functions, the counterbalancing demands of speed and accuracy have left us with no completely satisfactory solution thus far. This thesis will be focused on the methodology development towards more robust and reliable Protein-Ligand binding affinity calculation. In Part I, we will present the WaterMap method, which will bridge the gap between the efficiency of empirical scoring functions and the accuracy of rigorous FEP methods. Unlike most other methods with the main focus on the direct interaction between the protein and the ligand, the WaterMap method we developed considers the explicit driving force from the solvent, in which several individual water molecules in the binding pocket play an active role in the binding process. We demonstrate that protein may adopt active site geometries that will destabilize the water molecules in the binding pocket through hydrophobic enclosure and/or correlated hydrogen bonds, and displacement of these water molecules by ligand groups complementary to protein surface will provide the driving force for ligand binding. In some extreme cases, the interactions are so unfavorable for water molecules that a void is formed in the binding pocket of protein. Our method also considers the contribution from occupation of ligand atoms in the dry regions of binding pocket, which in some cases provides the driving force for ligand binding. FEP provides an in-principle rigorous method to calculate protein-ligand binding affinities within the limitations of the potential energy model and it may have a potentially large impact on structure based drug design projects especially during late stage lead optimization when productive decisions about compound modification are made . However, converging explicit solvent simulations to the desired precision is far from trivial, especially when there are large structural reorganizations in the protein or in the ligand upon the formation of the binding complex or upon the alchemical transformation from one ligand to another. In these cases, there can be large energy barriers separating the different conformations and the ligand or the protein may remain kinetically trapped in the starting configuration for a very long time during brute-force FEP/MD simulations. The incomplete sampling of the configuration space results in the computed binding free energies being dependent on the starting protein or ligand configurations, thus giving rise to the well known quasi-nonergodicity problem in FEP. In Part II, we will present a new protocol called FEP/REST, which combines the recently developed enhanced sampling technique REST (Replica Exchange with Solute Tempering) into normal FEP to solve the sampling problem in brute force FEP calculation. The computational cost of this method is comparable with normal FEP, and it can be very easily generalized to more complicated systems of pharmaceutical interest. We apply this method to two modifications of protein-ligand complexes which lead to significant conformational changes, the first in the protein and the second in the ligand. The new approach is shown to facilitate sampling in these challenging cases where high free energy barriers separate the initial and final conformations, and leads to superior convergence of the free energy as demonstrated both by consistency of the results (independence from the starting conformation) and agreement with experimental binding affinity data. Part III focus on two topics towards the foundational understanding of hydrophobic interactions and electrostatic interactions. To be specific, the nonadditivity effect of hydrophobic interactions in model enclosures is studied in Chapter 9, and the competition between hydrophobic interaction and electrostatic interaction between a hydrophobe and model enclosure is studied in Chapter 10. The approximations in popular implicit solvent models, like the surface area model in hydrophobic interaction, and the quadratic dependence of electrostatic interaction on the magnitude of charge are investigated. Six of the Chapters (Chapter 2-4, Chapter 6, and Chapter 9-10) have been published and the other one (Chapter 7) has been accepted for publication and currently is in press. Each Part begins with its own introduction. Each chapter also contains its own abstract and introduction, and focus on one specific topic. They all share the common theme, that is to develop more robust and reliable methods to calculate protein-ligand binding affinities. The conclusions and discussions about future research directions are presented in Part IV
Recommended from our members
Computational Inverse Problems for Partial Differential Equations
The problem of determining unknown quantities in a PDE from measurements of (part of) the solution to this PDE arises in a wide range of applications in science, technology, medicine, and finance. The unknown quantity may e.g. be a coefficient, an initial or a boundary condition, a source term, or the shape of a boundary. The identification of such quantities is often computationally challenging and requires profound knowledge of the analytical properties of the underlying PDE as well as numerical techniques. The focus of this workshop was on applications in phase retrieval, imaging with waves in random media, and seismology of the Earth and the Sun, a further emphasis was put on stochastic aspects in the context of uncertainty quantification and parameter identification in stochastic differential equations. Many open problems and mathematical challenges in application fields were addressed, and intensive discussions provided an insight into the high potential of joining deep knowledge in numerical analysis, partial differential equations, and regularization, but also in mathematical statistics, homogenization, optimization, differential geometry, numerical linear algebra, and variational analysis to tackle these challenges
- …