169 research outputs found
An Exact Algorithm for Side-Chain Placement in Protein Design
Computational protein design aims at constructing novel or improved functions
on the structure of a given protein backbone and has important applications in
the pharmaceutical and biotechnical industry. The underlying combinatorial
side-chain placement problem consists of choosing a side-chain placement for
each residue position such that the resulting overall energy is minimum. The
choice of the side-chain then also determines the amino acid for this position.
Many algorithms for this NP-hard problem have been proposed in the context of
homology modeling, which, however, reach their limits when faced with large
protein design instances.
In this paper, we propose a new exact method for the side-chain placement
problem that works well even for large instance sizes as they appear in protein
design. Our main contribution is a dedicated branch-and-bound algorithm that
combines tight upper and lower bounds resulting from a novel Lagrangian
relaxation approach for side-chain placement. Our experimental results show
that our method outperforms alternative state-of-the art exact approaches and
makes it possible to optimally solve large protein design instances routinely
Double Exponential Instability of Triangular Arbitrage Systems
If financial markets displayed the informational efficiency postulated in the
efficient markets hypothesis (EMH), arbitrage operations would be
self-extinguishing. The present paper considers arbitrage sequences in foreign
exchange (FX) markets, in which trading platforms and information are
fragmented. In Kozyakin et al. (2010) and Cross et al. (2012) it was shown that
sequences of triangular arbitrage operations in FX markets containing 4
currencies and trader-arbitrageurs tend to display periodicity or grow
exponentially rather than being self-extinguishing. This paper extends the
analysis to 5 or higher-order currency worlds. The key findings are that in a
5-currency world arbitrage sequences may also follow an exponential law as well
as display periodicity, but that in higher-order currency worlds a double
exponential law may additionally apply. There is an "inheritance of
instability" in the higher-order currency worlds. Profitable arbitrage
operations are thus endemic rather that displaying the self-extinguishing
properties implied by the EMH.Comment: 22 pages, 22 bibliography references, expanded Introduction and
Conclusion, added bibliohraphy reference
Algorithm engineering for optimal alignment of protein structure distance matrices
Protein structural alignment is an important problem in computational
biology. In this paper, we present first successes on provably optimal pairwise
alignment of protein inter-residue distance matrices, using the popular Dali
scoring function. We introduce the structural alignment problem formally, which
enables us to express a variety of scoring functions used in previous work as
special cases in a unified framework. Further, we propose the first
mathematical model for computing optimal structural alignments based on dense
inter-residue distance matrices. We therefore reformulate the problem as a
special graph problem and give a tight integer linear programming model. We
then present algorithm engineering techniques to handle the huge integer linear
programs of real-life distance matrix alignment problems. Applying these
techniques, we can compute provably optimal Dali alignments for the very first
time
A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer
Recently, several classifiers that combine primary tumor data, like gene
expression data, and secondary data sources, such as protein-protein
interaction networks, have been proposed for predicting outcome in breast
cancer. In these approaches, new composite features are typically constructed
by aggregating the expression levels of several genes. The secondary data
sources are employed to guide this aggregation. Although many studies claim
that these approaches improve classification performance over single gene
classifiers, the gain in performance is difficult to assess. This stems mainly
from the fact that different breast cancer data sets and validation procedures
are employed to assess the performance. Here we address these issues by
employing a large cohort of six breast cancer data sets as benchmark set and by
performing an unbiased evaluation of the classification accuracies of the
different approaches. Contrary to previous claims, we find that composite
feature classifiers do not outperform simple single gene classifiers. We
investigate the effect of (1) the number of selected features; (2) the specific
gene set from which features are selected; (3) the size of the training set and
(4) the heterogeneity of the data set on the performance of composite feature
and single gene classifiers. Strikingly, we find that randomization of
secondary data sources, which destroys all biological information in these
sources, does not result in a deterioration in performance of composite feature
classifiers. Finally, we show that when a proper correction for gene set size
is performed, the stability of single gene sets is similar to the stability of
composite feature sets. Based on these results there is currently no reason to
prefer prognostic classifiers based on composite features over single gene
classifiers for predicting outcome in breast cancer
Improving mungbean growth in a semiarid dryland system with agricultural waste biochars and cattle manure
Mungbean (Vigna radiata L.) productivity in dryland decreased recently due to the soil fertility degradation. The objective of this study was to evaluate the effect of biochar types and cattle manure rates on the growth of mungbean in semi-arid dark soil. The factorial completely randomized block design 3 x 5 with four replicates was set to arrange treatments for the field trial. Two biochars (rice husk and sawdust) at 10 t/ha in combination with four rates of cattle manure (1, 3, 5 and 10 t/ha) and control (without biochar and cattle manure) were applied to the soil, incubated for three weeks and then planted with mungbean cv. Fore Belu. The results revealed that additions of biochar and cattle manure increased soil moisture and soil electrical conductivity by 2-4% and 0.15-0.20, respectively; decreased soil temperature and bulk density by 1-2oC and 0.2 g/cm3, respectively; increased plant height, stem diameter, root length, total, shoot and root dry weights by 4 cm, 0.1 cm, 5 cm, 7 g, 0.9 g and 6 g, respectively, compared to the control. The best growth of mungbean was obtained from the additions of sawdust biochar at 10 t/ha and cattle manure at 3 t/ha
A Parsimony Approach to Biological Pathway Reconstruction/Inference for Genomes and Metagenomes
A common biological pathway reconstruction approach—as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences—starts with the identification of protein functions or families (e.g., KO families for the KEGG database and the FIG families for the SEED database) in the query sequences, followed by a direct mapping of the identified protein families onto pathways. Given a predicted patchwork of individual biochemical steps, some metric must be applied in deciding what pathways actually exist in the genome or metagenome represented by the sequences. Commonly, and straightforwardly, a complete biological pathway can be identified in a dataset if at least one of the steps associated with the pathway is found. We report, however, that this naïve mapping approach leads to an inflated estimate of biological pathways, and thus overestimates the functional diversity of the sample from which the DNA sequences are derived. We developed a parsimony approach, called MinPath (Minimal set of Pathways), for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath identified far fewer pathways for the genomes collected in the KEGG database—as compared to the naïve mapping approach—eliminating some obviously spurious pathway annotations. Results from applying MinPath to several metagenomes indicate that the common methods used for metagenome annotation may significantly overestimate the biological pathways encoded by microbial communities
Optimizing topological cascade resilience based on the structure of terrorist networks
Complex socioeconomic networks such as information, finance and even
terrorist networks need resilience to cascades - to prevent the failure of a
single node from causing a far-reaching domino effect. We show that terrorist
and guerrilla networks are uniquely cascade-resilient while maintaining high
efficiency, but they become more vulnerable beyond a certain threshold. We also
introduce an optimization method for constructing networks with high passive
cascade resilience. The optimal networks are found to be based on cells, where
each cell has a star topology. Counterintuitively, we find that there are
conditions where networks should not be modified to stop cascades because doing
so would come at a disproportionate loss of efficiency. Implementation of these
findings can lead to more cascade-resilient networks in many diverse areas.Comment: 26 pages. v2: In review at Public Library of Science ON
Simultaneous Optimization of Both Node and Edge Conservation in Network Alignment via WAVE
Network alignment can be used to transfer functional knowledge between
conserved regions of different networks. Typically, existing methods use a node
cost function (NCF) to compute similarity between nodes in different networks
and an alignment strategy (AS) to find high-scoring alignments with respect to
the total NCF over all aligned nodes (or node conservation). But, they then
evaluate quality of their alignments via some other measure that is different
than the node conservation measure used to guide the alignment construction
process. Typically, one measures the amount of conserved edges, but only after
alignments are produced. Hence, a recent attempt aimed to directly maximize the
amount of conserved edges while constructing alignments, which improved
alignment accuracy. Here, we aim to directly maximize both node and edge
conservation during alignment construction to further improve alignment
accuracy. For this, we design a novel measure of edge conservation that (unlike
existing measures that treat each conserved edge the same) weighs each
conserved edge so that edges with highly NCF-similar end nodes are favored. As
a result, we introduce a novel AS, Weighted Alignment VotEr (WAVE), which can
optimize any measures of node and edge conservation, and which can be used with
any NCF or combination of multiple NCFs. Using WAVE on top of established
state-of-the-art NCFs leads to superior alignments compared to the existing
methods that optimize only node conservation or only edge conservation or that
treat each conserved edge the same. And while we evaluate WAVE in the
computational biology domain, it is easily applicable in any domain.Comment: 12 pages, 4 figure
- …