81,438 research outputs found
Fast Tree Search for Enumeration of a Lattice Model of Protein Folding
Using a fast tree-searching algorithm and a Pentium cluster, we enumerated
all the sequences and compact conformations (structures) for a protein folding
model on a cubic lattice of size . We used two types of amino
acids -- hydrophobic (H) and polar (P) -- to make up the sequences, so there
were different sequences. The total number
of distinct structures was 84,731,192. We made use of a simple solvation model
in which the energy of a sequence folded into a structure is minus the number
of hydrophobic amino acids in the ``core'' of the structure. For every
sequence, we found its ground state or ground states, i.e., the structure or
structures for which its energy is lowest. About 0.3% of the sequences have a
unique ground state. The number of structures that are unique ground states of
at least one sequence is 2,662,050, about 3% of the total number of structures.
However, these ``designable'' structures differ drastically in their
designability, defined as the number of sequences whose unique ground state is
that structure. To understand this variation in designability, we studied the
distribution of structures in a high dimensional space in which each structure
is represented by a string of 1's and 0's, denoting core and surface sites,
respectively.Comment: 18 pages, 10 figure
FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding
The advent of highly accurate protein structure prediction methods has fueled
an exponential expansion of the protein structure database. Consequently, there
is a rising demand for rapid and precise structural homolog search. Traditional
alignment-based methods are dedicated to precise comparisons between pairs,
exhibiting high accuracy. However, their sluggish processing speed is no longer
adequate for managing the current massive volume of data. In response to this
challenge, we propose a novel deep-learning approach FoldExplorer. It harnesses
the powerful capabilities of graph attention neural networks and protein large
language models for protein structures and sequences data processing to
generate embeddings for protein structures. The structural embeddings can be
used for fast and accurate protein search. The embeddings also provide insights
into the protein space. FoldExplorer demonstrates a substantial performance
improvement of 5% to 8% over the current state-of-the-art algorithm on the
benchmark datasets. Meanwhile, FoldExplorer does not compromise on search speed
and excels particularly in searching on a large-scale dataset.Comment: 14 pages, 8 figure
Functionals linear in curvature and statistics of helical proteins
The effective free energy of globular protein chain is considered to be a
functional defined on smooth curves in three dimensional Euclidean space. From
the requirement of geometrical invariance, together with basic facts on
conformation of helical proteins and dynamical characteristics of the protein
chains, we are able to determine, in a unique way, the exact form of the free
energy functional. Namely, the free energy density should be a linear function
of the curvature of curves on which the free energy functional is defined. We
briefly discuss the possibility of using the model proposed in Monte Carlo
simulations of exhaustive searching the native stable state of the protein
chain. The relation of this model to the rigid relativistic particles and
strings is also considered.Comment: 18 pages, LaTeX2e, no figures, no tables; the title is changed
slightly, the explanations are added concerning the physical content of the
approach; the list of references is enlarge
Efficient Algorithms for Node Disjoint Subgraph Homeomorphism Determination
Recently, great efforts have been dedicated to researches on the management
of large scale graph based data such as WWW, social networks, biological
networks. In the study of graph based data management, node disjoint subgraph
homeomorphism relation between graphs is more suitable than (sub)graph
isomorphism in many cases, especially in those cases that node skipping and
node mismatching are allowed. However, no efficient node disjoint subgraph
homeomorphism determination (ndSHD) algorithms have been available. In this
paper, we propose two computationally efficient ndSHD algorithms based on state
spaces searching with backtracking, which employ many heuristics to prune the
search spaces. Experimental results on synthetic data sets show that the
proposed algorithms are efficient, require relative little time in most of the
testing cases, can scale to large or dense graphs, and can accommodate to more
complex fuzzy matching cases.Comment: 15 pages, 11 figures, submitted to DASFAA 200
Kinetics of protein-DNA interaction: facilitated target location in sequence-dependent potential
Recognition and binding of specific sites on DNA by proteins is central for
many cellular functions such as transcription, replication, and recombination.
In the process of recognition, a protein rapidly searches for its specific site
on a long DNA molecule and then strongly binds this site. Here we aim to find a
mechanism that can provide both a fast search (1-10 sec) and high stability of
the specific protein-DNA complex ( M).
Earlier studies have suggested that rapid search involves the sliding of a
protein along the DNA. Here we consider sliding as a one-dimensional (1D)
diffusion in a sequence-dependent rough energy landscape. We demonstrate that,
in spite of the landscape's roughness, rapid search can be achieved if 1D
sliding is accompanied by 3D diffusion. We estimate the range of the specific
and non-specific DNA-binding energy required for rapid search and suggest
experiments that can test our mechanism. We show that optimal search requires a
protein to spend half of time sliding along the DNA and half diffusing in 3D.
We also establish that, paradoxically, realistic energy functions cannot
provide both rapid search and strong binding of a rigid protein. To reconcile
these two fundamental requirements we propose a search-and-fold mechanism that
involves the coupling of protein binding and partial protein folding.
Proposed mechanism has several important biological implications for search
in the presence of other proteins and nucleosomes, simultaneous search by
several proteins etc. Proposed mechanism also provides a new framework for
interpretation of experimental and structural data on protein-DNA interactions
- …