4,144 research outputs found
New error measures and methods for realizing protein graphs from distance data
The interval Distance Geometry Problem (iDGP) consists in finding a
realization in of a simple undirected graph with
nonnegative intervals assigned to the edges in such a way that, for each edge,
the Euclidean distance between the realization of the adjacent vertices is
within the edge interval bounds. In this paper, we focus on the application to
the conformation of proteins in space, which is a basic step in determining
protein function: given interval estimations of some of the inter-atomic
distances, find their shape. Among different families of methods for
accomplishing this task, we look at mathematical programming based methods,
which are well suited for dealing with intervals. The basic question we want to
answer is: what is the best such method for the problem? The most meaningful
error measure for evaluating solution quality is the coordinate root mean
square deviation. We first introduce a new error measure which addresses a
particular feature of protein backbones, i.e. many partial reflections also
yield acceptable backbones. We then present a set of new and existing quadratic
and semidefinite programming formulations of this problem, and a set of new and
existing methods for solving these formulations. Finally, we perform a
computational evaluation of all the feasible solverformulation combinations
according to new and existing error measures, finding that the best methodology
is a new heuristic method based on multiplicative weights updates
Euclidean distance geometry and applications
Euclidean distance geometry is the study of Euclidean geometry based on the
concept of distance. This is useful in several applications where the input
data consists of an incomplete set of distances, and the output is a set of
points in Euclidean space that realizes the given distances. We survey some of
the theory of Euclidean distance geometry and some of the most important
applications: molecular conformation, localization of sensor networks and
statics.Comment: 64 pages, 21 figure
A Metric for genus-zero surfaces
We present a new method to compare the shapes of genus-zero surfaces. We
introduce a measure of mutual stretching, the symmetric distortion energy, and
establish the existence of a conformal diffeomorphism between any two
genus-zero surfaces that minimizes this energy. We then prove that the energies
of the minimizing diffeomorphisms give a metric on the space of genus-zero
Riemannian surfaces. This metric and the corresponding optimal diffeomorphisms
are shown to have properties that are highly desirable for applications.Comment: 33 pages, 8 figure
Leveraging Citation Networks to Visualize Scholarly Influence Over Time
Assessing the influence of a scholar's work is an important task for funding
organizations, academic departments, and researchers. Common methods, such as
measures of citation counts, can ignore much of the nuance and
multidimensionality of scholarly influence. We present an approach for
generating dynamic visualizations of scholars' careers. This approach uses an
animated node-link diagram showing the citation network accumulated around the
researcher over the course of the career in concert with key indicators,
highlighting influence both within and across fields. We developed our design
in collaboration with one funding organization---the Pew Biomedical Scholars
program---but the methods are generalizable to visualizations of scholarly
influence. We applied the design method to the Microsoft Academic Graph, which
includes more than 120 million publications. We validate our abstractions
throughout the process through collaboration with the Pew Biomedical Scholars
program officers and summative evaluations with their scholars
kLog: A Language for Logical and Relational Learning with Kernels
We introduce kLog, a novel approach to statistical relational learning.
Unlike standard approaches, kLog does not represent a probability distribution
directly. It is rather a language to perform kernel-based learning on
expressive logical and relational representations. kLog allows users to specify
learning problems declaratively. It builds on simple but powerful concepts:
learning from interpretations, entity/relationship data modeling, logic
programming, and deductive databases. Access by the kernel to the rich
representation is mediated by a technique we call graphicalization: the
relational representation is first transformed into a graph --- in particular,
a grounded entity/relationship diagram. Subsequently, a choice of graph kernel
defines the feature space. kLog supports mixed numerical and symbolic data, as
well as background knowledge in the form of Prolog or Datalog programs as in
inductive logic programming systems. The kLog framework can be applied to
tackle the same range of tasks that has made statistical relational learning so
popular, including classification, regression, multitask learning, and
collective classification. We also report about empirical comparisons, showing
that kLog can be either more accurate, or much faster at the same level of
accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at
http://klog.dinfo.unifi.it along with tutorials
Consistency of spectral clustering in stochastic block models
We analyze the performance of spectral clustering for community extraction in
stochastic block models. We show that, under mild conditions, spectral
clustering applied to the adjacency matrix of the network can consistently
recover hidden communities even when the order of the maximum expected degree
is as small as , with the number of nodes. This result applies to
some popular polynomial time spectral clustering algorithms and is further
extended to degree corrected stochastic block models using a spherical
-median spectral clustering method. A key component of our analysis is a
combinatorial bound on the spectrum of binary random matrices, which is sharper
than the conventional matrix Bernstein inequality and may be of independent
interest.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1274 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Geometric, Feature-based and Graph-based Approaches for the Structural Analysis of Protein Binding Sites : Novel Methods and Computational Analysis
In this thesis, protein binding sites are considered. To enable the extraction of information from the space of protein binding sites, these binding sites must be mapped onto a mathematical space. This can be done by mapping binding sites onto vectors, graphs or point clouds. To finally enable a structure on the mathematical space, a distance measure is required, which is introduced in this thesis. This distance measure eventually can be used to extract information by means of data mining techniques
Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity
The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level.
Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism.
From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable.
In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems.
Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis
- …