81,438 research outputs found

    Fast Tree Search for Enumeration of a Lattice Model of Protein Folding

    Full text link
    Using a fast tree-searching algorithm and a Pentium cluster, we enumerated all the sequences and compact conformations (structures) for a protein folding model on a cubic lattice of size 4×3×34\times3\times3. We used two types of amino acids -- hydrophobic (H) and polar (P) -- to make up the sequences, so there were 2366.87×10102^{36} \approx 6.87 \times 10^{10} different sequences. The total number of distinct structures was 84,731,192. We made use of a simple solvation model in which the energy of a sequence folded into a structure is minus the number of hydrophobic amino acids in the ``core'' of the structure. For every sequence, we found its ground state or ground states, i.e., the structure or structures for which its energy is lowest. About 0.3% of the sequences have a unique ground state. The number of structures that are unique ground states of at least one sequence is 2,662,050, about 3% of the total number of structures. However, these ``designable'' structures differ drastically in their designability, defined as the number of sequences whose unique ground state is that structure. To understand this variation in designability, we studied the distribution of structures in a high dimensional space in which each structure is represented by a string of 1's and 0's, denoting core and surface sites, respectively.Comment: 18 pages, 10 figure

    FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding

    Full text link
    The advent of highly accurate protein structure prediction methods has fueled an exponential expansion of the protein structure database. Consequently, there is a rising demand for rapid and precise structural homolog search. Traditional alignment-based methods are dedicated to precise comparisons between pairs, exhibiting high accuracy. However, their sluggish processing speed is no longer adequate for managing the current massive volume of data. In response to this challenge, we propose a novel deep-learning approach FoldExplorer. It harnesses the powerful capabilities of graph attention neural networks and protein large language models for protein structures and sequences data processing to generate embeddings for protein structures. The structural embeddings can be used for fast and accurate protein search. The embeddings also provide insights into the protein space. FoldExplorer demonstrates a substantial performance improvement of 5% to 8% over the current state-of-the-art algorithm on the benchmark datasets. Meanwhile, FoldExplorer does not compromise on search speed and excels particularly in searching on a large-scale dataset.Comment: 14 pages, 8 figure

    Functionals linear in curvature and statistics of helical proteins

    Full text link
    The effective free energy of globular protein chain is considered to be a functional defined on smooth curves in three dimensional Euclidean space. From the requirement of geometrical invariance, together with basic facts on conformation of helical proteins and dynamical characteristics of the protein chains, we are able to determine, in a unique way, the exact form of the free energy functional. Namely, the free energy density should be a linear function of the curvature of curves on which the free energy functional is defined. We briefly discuss the possibility of using the model proposed in Monte Carlo simulations of exhaustive searching the native stable state of the protein chain. The relation of this model to the rigid relativistic particles and strings is also considered.Comment: 18 pages, LaTeX2e, no figures, no tables; the title is changed slightly, the explanations are added concerning the physical content of the approach; the list of references is enlarge

    Efficient Algorithms for Node Disjoint Subgraph Homeomorphism Determination

    Full text link
    Recently, great efforts have been dedicated to researches on the management of large scale graph based data such as WWW, social networks, biological networks. In the study of graph based data management, node disjoint subgraph homeomorphism relation between graphs is more suitable than (sub)graph isomorphism in many cases, especially in those cases that node skipping and node mismatching are allowed. However, no efficient node disjoint subgraph homeomorphism determination (ndSHD) algorithms have been available. In this paper, we propose two computationally efficient ndSHD algorithms based on state spaces searching with backtracking, which employ many heuristics to prune the search spaces. Experimental results on synthetic data sets show that the proposed algorithms are efficient, require relative little time in most of the testing cases, can scale to large or dense graphs, and can accommodate to more complex fuzzy matching cases.Comment: 15 pages, 11 figures, submitted to DASFAA 200

    Kinetics of protein-DNA interaction: facilitated target location in sequence-dependent potential

    Get PDF
    Recognition and binding of specific sites on DNA by proteins is central for many cellular functions such as transcription, replication, and recombination. In the process of recognition, a protein rapidly searches for its specific site on a long DNA molecule and then strongly binds this site. Here we aim to find a mechanism that can provide both a fast search (1-10 sec) and high stability of the specific protein-DNA complex (Kd=1015108K_d=10^{-15}-10^{-8} M). Earlier studies have suggested that rapid search involves the sliding of a protein along the DNA. Here we consider sliding as a one-dimensional (1D) diffusion in a sequence-dependent rough energy landscape. We demonstrate that, in spite of the landscape's roughness, rapid search can be achieved if 1D sliding is accompanied by 3D diffusion. We estimate the range of the specific and non-specific DNA-binding energy required for rapid search and suggest experiments that can test our mechanism. We show that optimal search requires a protein to spend half of time sliding along the DNA and half diffusing in 3D. We also establish that, paradoxically, realistic energy functions cannot provide both rapid search and strong binding of a rigid protein. To reconcile these two fundamental requirements we propose a search-and-fold mechanism that involves the coupling of protein binding and partial protein folding. Proposed mechanism has several important biological implications for search in the presence of other proteins and nucleosomes, simultaneous search by several proteins etc. Proposed mechanism also provides a new framework for interpretation of experimental and structural data on protein-DNA interactions
    corecore