3,149 research outputs found

    Differentiable Programming Tensor Networks

    Full text link
    Differentiable programming is a fresh programming paradigm which composes parameterized algorithmic components and trains them using automatic differentiation (AD). The concept emerges from deep learning but is not only limited to training neural networks. We present theory and practice of programming tensor network algorithms in a fully differentiable way. By formulating the tensor network algorithm as a computation graph, one can compute higher order derivatives of the program accurately and efficiently using AD. We present essential techniques to differentiate through the tensor networks contractions, including stable AD for tensor decomposition and efficient backpropagation through fixed point iterations. As a demonstration, we compute the specific heat of the Ising model directly by taking the second order derivative of the free energy obtained in the tensor renormalization group calculation. Next, we perform gradient based variational optimization of infinite projected entangled pair states for quantum antiferromagnetic Heisenberg model and obtain start-of-the-art variational energy and magnetization with moderate efforts. Differentiable programming removes laborious human efforts in deriving and implementing analytical gradients for tensor network programs, which opens the door to more innovations in tensor network algorithms and applications.Comment: Typos corrected, discussion and refs added; revised version accepted for publication in PRX. Source code available at https://github.com/wangleiphy/tensorgra

    PDA: an automatic and comprehensive analysis program for protein-DNA complex structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Knowledge of protein-DNA interactions at the structural-level can provide insights into the mechanisms of protein-DNA recognition and gene regulation. Although over 1400 protein-DNA complex structures have been deposited into Protein Data Bank (PDB), the structural details of protein-DNA interactions are generally not available. In addition, current approaches to comparison of protein-DNA complexes are mainly based on protein sequence similarity while the DNA sequences are not taken into account. With the number of experimentally-determined protein-DNA complex structures increasing, there is a need for an automatic program to analyze the protein-DNA complex structures and to provide comprehensive structural information for the benefit of the whole research community.</p> <p>Results</p> <p>We developed an automatic and comprehensive protein-DNA complex structure analysis program, PDA (for <it>p</it>rotein-<it>D</it>NA complex structure <it>a</it>nalyzer). PDA takes PDB files as inputs and performs structural analysis that includes 1) whole protein-DNA complex structure restoration, especially the reconstruction of double-stranded DNA structures; 2) an efficient new approach for DNA base-pair detection; 3) systematic annotation of protein-DNA interactions; and 4) extraction of DNA subsequences involved in protein-DNA interactions and identification of protein-DNA binding units. Protein-DNA complex structures in current PDB were processed and analyzed with our PDA program and the analysis results were stored in a database. A dataset useful for studying protein-DNA interactions involved in gene regulation was generated using both protein and DNA sequences as well as the contact information of the complexes. WebPDA was developed to provide a web interface for using PDA and for data retrieval.</p> <p>Conclusion</p> <p>PDA is a computational tool for structural annotations of protein-DNA complexes. It provides a useful resource for investigating protein-DNA interactions. Data from the PDA analysis can also facilitate the classification of protein-DNA complexes and provide insights into rational design of benchmarks. The PDA program is freely available at <url>http://bioinfozen.uncc.edu/webpda</url>.</p

    Systematic analysis of short internal indels and their impact on protein folding

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein sequence insertions/deletions (indels) can be introduced during evolution or through alternative splicing (AS). Alternative splicing is an important biological phenomenon and is considered as the major means of expanding structural and functional diversity in eukaryotes. Knowledge of the structural changes due to indels is critical to our understanding of the evolution of protein structure and function. In addition, it can help us probe the evolution of alternative splicing and the diversity of functional isoforms. However, little is known about the effects of indels, in particular the ones involving core secondary structures, on the folding of protein structures. The long term goal of our study is to accurately predict the protein AS isoform structures. As a first step towards this goal, we performed a systematic analysis on the structural changes caused by short internal indels through mining highly homologous proteins in Protein Data Bank (PDB).</p> <p>Results</p> <p>We compiled a non-redundant dataset of short internal indels (2-40 amino acids) from highly homologous protein pairs and analyzed the sequence and structural features of the indels. We found that about one third of indel residues are in disordered state and majority of the residues are exposed to solvent, suggesting that these indels are generally located on the surface of proteins. Though naturally occurring indels are fewer than engineered ones in the dataset, there are no statistically significant differences in terms of amino acid frequencies and secondary structure types between the "Natural" indels and "All" indels in the dataset. Structural comparisons show that all the protein pairs with short internal indels in the dataset preserve the structural folds and about 85% of protein pairs have global RMSDs (root mean square deviations) of 2Å or less, suggesting that protein structures tend to be conserved and can tolerate short insertions and deletions. A few pairs with high RMSDs are results of relative domain positions of the proteins, probably due to the intrinsically dynamic nature of the proteins.</p> <p>Conclusions</p> <p>The analysis demonstrated that protein structures have the "plasticity" to tolerate short indels. This study can provide valuable guides in modeling protein AS isoform structures and homologous proteins with indels through placing the indels at the right locations since the accuracy of sequence alignments dictate model qualities in homology modeling.</p

    Set Representations of Linegraphs

    Full text link
    Let GG be a graph with vertex set V(G)V(G) and edge set E(G)E(G). A family S\mathcal{S} of nonempty sets {S1,,Sn}\{S_1,\ldots,S_n\} is a set representation of GG if there exists a one-to-one correspondence between the vertices v1,,vnv_1, \ldots, v_n in V(G)V(G) and the sets in S\mathcal{S} such that vivjE(G)v_iv_j \in E(G) if and only if S_i\cap S_j\neq \es. A set representation S\mathcal{S} is a distinct (respectively, antichain, uniform and simple) set representation if any two sets SiS_i and SjS_j in S\mathcal{S} have the property SiSjS_i\neq S_j (respectively, SiSjS_i\nsubseteq S_j, Si=Sj|S_i|=|S_j| and SiSj1|S_i\cap S_j|\leqslant 1). Let U(S)=i=1nSiU(\mathcal{S})=\bigcup_{i=1}^n S_i. Two set representations S\mathcal{S} and S\mathcal{S}' are isomorphic if S\mathcal{S}' can be obtained from S\mathcal{S} by a bijection from U(S)U(\mathcal{S}) to U(S)U(\mathcal{S}'). Let FF denote a class of set representations of a graph GG. The type of FF is the number of equivalence classes under the isomorphism relation. In this paper, we investigate types of set representations for linegraphs. We determine the types for the following categories of set representations: simple-distinct, simple-antichain, simple-uniform and simple-distinct-uniform

    TFinDit: transcription factor-DNA interaction data depository

    Get PDF
    BACKGROUND: One of the crucial steps in regulation of gene expression is the binding of transcription factor(s) to specific DNA sequences. Knowledge of the binding affinity and specificity at a structural level between transcription factors and their target sites has important implications in our understanding of the mechanism of gene regulation. Due to their unique functions and binding specificity, there is a need for a transcription factor-specific, structure-based database and corresponding web service to facilitate structural bioinformatics studies of transcription factor-DNA interactions, such as development of knowledge-based interaction potential, transcription factor-DNA docking, binding induced conformational changes, and the thermodynamics of protein-DNA interactions. DESCRIPTION: TFinDit is a relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria. CONCLUSIONS: TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other preprocessed data. We believe that this database/web service can facilitate the development and testing of TF-DNA interaction potentials and TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms
    corecore