88 research outputs found
Identifying functional modules in protein–protein interaction networks: an integrated exact approach
Motivation: With the exponential growth of expression and protein–protein interaction (PPI) data, the frontier of research in systems biology shifts more and more to the integrated analysis of these large datasets. Of particular interest is the identification of functional modules in PPI networks, sharing common cellular function beyond the scope of classical pathways, by means of detecting differentially expressed regions in PPI networks. This requires on the one hand an adequate scoring of the nodes in the network to be identified and on the other hand the availability of an effective algorithm to find the maximally scoring network regions. Various heuristic approaches have been proposed in the literature
Algorithm engineering for optimal alignment of protein structure distance matrices
Protein structural alignment is an important problem in computational
biology. In this paper, we present first successes on provably optimal pairwise
alignment of protein inter-residue distance matrices, using the popular Dali
scoring function. We introduce the structural alignment problem formally, which
enables us to express a variety of scoring functions used in previous work as
special cases in a unified framework. Further, we propose the first
mathematical model for computing optimal structural alignments based on dense
inter-residue distance matrices. We therefore reformulate the problem as a
special graph problem and give a tight integer linear programming model. We
then present algorithm engineering techniques to handle the huge integer linear
programs of real-life distance matrix alignment problems. Applying these
techniques, we can compute provably optimal Dali alignments for the very first
time
An Exact Algorithm for Side-Chain Placement in Protein Design
Computational protein design aims at constructing novel or improved functions
on the structure of a given protein backbone and has important applications in
the pharmaceutical and biotechnical industry. The underlying combinatorial
side-chain placement problem consists of choosing a side-chain placement for
each residue position such that the resulting overall energy is minimum. The
choice of the side-chain then also determines the amino acid for this position.
Many algorithms for this NP-hard problem have been proposed in the context of
homology modeling, which, however, reach their limits when faced with large
protein design instances.
In this paper, we propose a new exact method for the side-chain placement
problem that works well even for large instance sizes as they appear in protein
design. Our main contribution is a dedicated branch-and-bound algorithm that
combines tight upper and lower bounds resulting from a novel Lagrangian
relaxation approach for side-chain placement. Our experimental results show
that our method outperforms alternative state-of-the art exact approaches and
makes it possible to optimally solve large protein design instances routinely
CSA: Comprehensive comparison of pairwise protein structure alignments
htmlabstractCSA is a web server for the computation, evaluation and comprehensive comparison of pairwise protein structure alignments. Its exact alignment engine computes either optimal, top-scoring alignments or heuristic alignments with quality guarantee for the inter-residue distance-based scorings of contact map overlap, PAUL, DALI and MATRAS. These and additional, uploaded alignments are compared using a number of quality measures and intuitive visualizations. CSA brings new insight into the structural relationship of the protein pairs under investigation and is a valuable tool for studying structural similarities. It is available at http://csa.project.cwi.nl
PAUL: Protein structural alignment using integer linear programming and Lagrangian relaxation
A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer
Recently, several classifiers that combine primary tumor data, like gene
expression data, and secondary data sources, such as protein-protein
interaction networks, have been proposed for predicting outcome in breast
cancer. In these approaches, new composite features are typically constructed
by aggregating the expression levels of several genes. The secondary data
sources are employed to guide this aggregation. Although many studies claim
that these approaches improve classification performance over single gene
classifiers, the gain in performance is difficult to assess. This stems mainly
from the fact that different breast cancer data sets and validation procedures
are employed to assess the performance. Here we address these issues by
employing a large cohort of six breast cancer data sets as benchmark set and by
performing an unbiased evaluation of the classification accuracies of the
different approaches. Contrary to previous claims, we find that composite
feature classifiers do not outperform simple single gene classifiers. We
investigate the effect of (1) the number of selected features; (2) the specific
gene set from which features are selected; (3) the size of the training set and
(4) the heterogeneity of the data set on the performance of composite feature
and single gene classifiers. Strikingly, we find that randomization of
secondary data sources, which destroys all biological information in these
sources, does not result in a deterioration in performance of composite feature
classifiers. Finally, we show that when a proper correction for gene set size
is performed, the stability of single gene sets is similar to the stability of
composite feature sets. Based on these results there is currently no reason to
prefer prognostic classifiers based on composite features over single gene
classifiers for predicting outcome in breast cancer
Optimizing topological cascade resilience based on the structure of terrorist networks
Complex socioeconomic networks such as information, finance and even
terrorist networks need resilience to cascades - to prevent the failure of a
single node from causing a far-reaching domino effect. We show that terrorist
and guerrilla networks are uniquely cascade-resilient while maintaining high
efficiency, but they become more vulnerable beyond a certain threshold. We also
introduce an optimization method for constructing networks with high passive
cascade resilience. The optimal networks are found to be based on cells, where
each cell has a star topology. Counterintuitively, we find that there are
conditions where networks should not be modified to stop cascades because doing
so would come at a disproportionate loss of efficiency. Implementation of these
findings can lead to more cascade-resilient networks in many diverse areas.Comment: 26 pages. v2: In review at Public Library of Science ON
Enhancing the accuracy of HMM-based conserved pathway prediction using global correspondence scores
- …