29,268 research outputs found
Graph theoretic methods for the analysis of structural relationships in biological macromolecules
Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
Logic Programming as Constructivism
The features of logic programming that
seem unconventional from the viewpoint of classical logic
can be explained in terms of constructivistic logic. We
motivate and propose a constructivistic proof theory of
non-Horn logic programming. Then, we apply this formalization
for establishing results of practical interest.
First, we show that 'stratification can be motivated in a
simple and intuitive way. Relying on similar motivations,
we introduce the larger classes of 'loosely stratified' and
'constructively consistent' programs. Second, we give a
formal basis for introducing quantifiers into queries and
logic programs by defining 'constructively domain
independent* formulas. Third, we extend the Generalized
Magic Sets procedure to loosely stratified and constructively
consistent programs, by relying on a 'conditional
fixpoini procedure
On the relation between Differential Privacy and Quantitative Information Flow
Differential privacy is a notion that has emerged in the community of
statistical databases, as a response to the problem of protecting the privacy
of the database's participants when performing statistical queries. The idea is
that a randomized query satisfies differential privacy if the likelihood of
obtaining a certain answer for a database is not too different from the
likelihood of obtaining the same answer on adjacent databases, i.e. databases
which differ from for only one individual. Information flow is an area of
Security concerned with the problem of controlling the leakage of confidential
information in programs and protocols. Nowadays, one of the most established
approaches to quantify and to reason about leakage is based on the R\'enyi min
entropy version of information theory. In this paper, we analyze critically the
notion of differential privacy in light of the conceptual framework provided by
the R\'enyi min information theory. We show that there is a close relation
between differential privacy and leakage, due to the graph symmetries induced
by the adjacency relation. Furthermore, we consider the utility of the
randomized answer, which measures its expected degree of accuracy. We focus on
certain kinds of utility functions called "binary", which have a close
correspondence with the R\'enyi min mutual information. Again, it turns out
that there can be a tight correspondence between differential privacy and
utility, depending on the symmetries induced by the adjacency relation and by
the query. Depending on these symmetries we can also build an optimal-utility
randomization mechanism while preserving the required level of differential
privacy. Our main contribution is a study of the kind of structures that can be
induced by the adjacency relation and the query, and how to use them to derive
bounds on the leakage and achieve the optimal utility
Graph theoretic analysis of protein interaction networks of eukaryotes
Thanks to recent progress in high-throughput experimental techniques, the
datasets of large-scale protein interactions of prototypical multicellular
species, the nematode worm Caenorhabditis elegans and the fruit fly Drosophila
melanogaster, have been assayed. The datasets are obtained mainly by using the
yeast hybrid method, which contains false-positive and false-negative
simultaneously. Accordingly, while it is desirable to test such datasets
through further wet experiments, here we invoke recent developed network theory
to test such high throughput datasets in a simple way. Based on the fact that
the key biological processes indispensable to maintaining life are universal
across eukaryotic species, and the comparison of structural properties of the
protein interaction networks (PINs) of the two species with those of the yeast
PIN, we find that while the worm and the yeast PIN datasets exhibit similar
structural properties, the current fly dataset, though most comprehensively
screened ever, does not reflect generic structural properties correctly as it
is. The modularity is suppressed and the connectivity correlation is lacking.
Addition of interlogs to the current fly dataset increases the modularity and
enhances the occurrence of triangular motifs as well. The connectivity
correlation function of the fly, however, remains distinct under such interlogs
addition, for which we present a possible scenario through an in silico
modeling.Comment: 7 pages, 6 figures, 2 table
- …