14,044 research outputs found
Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems
A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a
predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the
Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in
Computational Biology.Peer ReviewedPostprint (author's final draft
Under-approximating Cut Sets for Reachability in Large Scale Automata Networks
In the scope of discrete finite-state models of interacting components, we
present a novel algorithm for identifying sets of local states of components
whose activity is necessary for the reachability of a given local state. If all
the local states from such a set are disabled in the model, the concerned
reachability is impossible. Those sets are referred to as cut sets and are
computed from a particular abstract causality structure, so-called Graph of
Local Causality, inspired from previous work and generalised here to finite
automata networks. The extracted sets of local states form an
under-approximation of the complete minimal cut sets of the dynamics: there may
exist smaller or additional cut sets for the given reachability. Applied to
qualitative models of biological systems, such cut sets provide potential
therapeutic targets that are proven to prevent molecules of interest to become
active, up to the correctness of the model. Our new method makes tractable the
formal analysis of very large scale networks, as illustrated by the computation
of cut sets within a Boolean model of biological pathways interactions
gathering more than 9000 components
Hierarchy of protein loop-lock structures: a new server for the decomposition of a protein structure into a set of closed loops
HoPLLS (Hierarchy of protein loop-lock structures)
(http://leah.haifa.ac.il/~skogan/Apache/mydata1/main.html) is a web server that
identifies closed loops - a structural basis for protein domain hierarchy. The
server is based on the loop-and-lock theory for structural organisation of
natural proteins. We describe this web server, the algorithms for the
decomposition of a 3D protein into loops and the results of scientific
investigations into a structural "alphabet" of loops and locks.Comment: 11 pages, 4 figure
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Computational analysis of a plant receptor interaction network
Trabajo fin de máster en Bioinformática y Biología ComputacionalIn all organisms, complex protein-protein interactions (PPI) networks control major
biological functions yet studying their structural features presents a major analytical
challenge. In plants, leucine-rich-repeat receptor kinases (LRR-RKs) are key in sensing
and transmitting non-self as well as self-signals from the cell surface. As such, LRR-RKs
have both developmental and immune functions that allow plants to make the most of their
environments. In the model organism in plant molecular biology, Arabidopsis thaliana,
most LRR-RKs are still represented by biochemically and genetically uncharacterized
receptors. To fix this an LRR-based Cell Surface Interaction (CSI LRR ) network was
obtained in 2018, a protein-protein interaction network of the extracellular domain of 170
LRR-RKs that contains 567 bidirectional interactions. Several network analyses have been
performed with CSI LRR . However, these analyses have so far not considered the spatial and
temporal expression of its proteins. Neither has it been characterized in detail the role of
the extracellular domain (ECD) size in the network structure. Because of that, the objective
of the present work is to continue with more in depth analyses with the CSI LRR network.
This would provide important insights that will facilitate LRR-RKs function
characterization.
The first aim of this work is to test out the fit of the CSI LRR network to a scale-free
topology. To accomplish that, the degree distribution of the CSI LRR network was compared
with the degree distribution of the known network models of scale-free and random.
Additionally, three network attack algorithms were implemented and applied to these two
network models and the CSI LRR network to compare their behavior. However, since the
CSI LRR interaction data comes from an in vitro screening, there is no direct evidence
whether its protein-protein interactions occur inside the plant cells. To gain insight on how
the network composition changes depending on the transcriptional regulation, the
interaction data of the CSI LRR was integrated with 4 different RNA-Seq datasets related
with the network biological functions. To automatize this task a Python script was written.
Furthermore, it was evaluated the role of the LRR-RKs in the network structure depending
on the size of their extracellular domain (large or small). For that, centrality parameters
were measured, and size-targeted attacks performed. Finally, gene regulatory information
was integrated into the CSI LRR to classify the different network proteins according to the
function of the transcription factors that regulate its expression.
The results were that CSI LRR fits a power law degree distribution and approximates a scale-
free topology. Moreover, CSI LRR displays high resistance to random attacks and reduced
resistance to hub/bottleneck-directed attacks, similarly to scale-free network model. Also,
the integration of CSI LRR interaction data and RNA-Seq data suggests that the
transcriptional regulation of the network is more relevant for developmental programs than
for defense responses. Another result was that the LRR-RKs with a small ECD size have a
major role in the maintenance of the CSI LRR integrity. Lastly, it was hypothesized that the
integration of CSI LRR interaction data with predicted gene regulatory networks could shed
light upon the functioning of growth-immunity signaling crosstalk
Graph Kernels
We present a unified framework to study graph kernels, special cases of which include the random
walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004;
Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time
complexity of kernel computation between unlabeled graphs with n vertices from O(n^6) to O(n^3).
We find a spectral decomposition approach even more efficient when computing entire kernel matrices.
For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn^3)
time per iteration, where d is the size of the label set. By extending the necessary linear algebra to
Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels,
and O(n^4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n^2)
time per iteration in all cases. Experiments on graphs from bioinformatics and other application
domains show that these techniques can speed up computation of the kernel by an order of magnitude
or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when
specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to
R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment
kernel of Fröhlich et al. (2006) yet provably positive semi-definite
Functional nucleic acids as substrate for information processing
Information processing applications driven by self-assembly and conformation dynamics of nucleic acids are possible. These underlying paradigms (self-assembly and conformation
dynamics) are essential for natural information processors as illustrated by proteins. A key advantage in utilising nucleic acids as information processors is the availability of computational tools to support the design process. This provides us with a platform to develop an integrated environment in which an orchestration of molecular building blocks can be realised. Strict arbitrary control over the design of these computational nucleic acids is not feasible. The microphysical behaviour of these molecular materials must be taken into consideration during the design phase. This thesis investigated, to what extent the construction of molecular building blocks for a particular purpose is possible with the support of a software environment. In this work we developed a computational protocol that functions on a multi-molecular level, which enable us to directly
incorporate the dynamic characteristics of nucleic acids molecules. To allow the implementation of this computational protocol, we developed a designer that able to solve the
nucleic acids inverse prediction problem, not only in the multi-stable states level, but also include the interactions among molecules that occur in each meta-stable state. The realisation of our computational protocol are evaluated by generating computational nucleic acids units that resembles synthetic RNA devices that have been successfully implemented in the laboratory. Furthermore, we demonstrated the feasibility of the
protocol to design various types of computational units. The accuracy and diversity of the generated candidates are significantly better than the best candidates produced by conventional designers. With the computational protocol, the design of nucleic acid information processor using a network of interconnecting nucleic acids is now feasible
Kernel methods in genomics and computational biology
Support vector machines and kernel methods are increasingly popular in
genomics and computational biology, due to their good performance in real-world
applications and strong modularity that makes them suitable to a wide range of
problems, from the classification of tumors to the automatic annotation of
proteins. Their ability to work in high dimension, to process non-vectorial
data, and the natural framework they provide to integrate heterogeneous data
are particularly relevant to various problems arising in computational biology.
In this chapter we survey some of the most prominent applications published so
far, highlighting the particular developments in kernel methods triggered by
problems in biology, and mention a few promising research directions likely to
expand in the future
- …