6 research outputs found

    Discover protein sequence signatures from protein-protein interaction data

    Get PDF
    Background: The development of high-throughput technologies such as yeast two- hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction ( PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge. Results: A total of 3108 sequence signatures were found, each of which was shared by a set of guest proteins interacting with one of 944 host proteins in Saccharomyces cerevisiae genome. Approximately 94% of these sequence signatures matched entries in InterPro member databases. We identified 84 distinct sequence signatures from the remaining 172 unknown signatures. The signature sharing information was then applied in predicting sub-cellular localization of yeast proteins and the novel signatures were used in identifying possible interacting sites. Conclusion: We reported a method of PPI data mining that facilitated the discovery of novel sequence signatures using a large PPI dataset from S. cerevisiae genome as input. The fact that 94% of discovered signatures were known validated the ability of the approach to identify large numbers of signatures from PPI data. The significance of these discovered signatures was demonstrated by their application in predicting sub- cellular localizations and identifying potential interaction binding sites of yeast proteins

    MEME: discovering and analyzing DNA and protein sequence motifs

    Get PDF
    MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel ‘signals’ in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. Users can perform MEME searches via the web server hosted by the National Biomedical Computation Resource () and several mirror sites. Through the same web server, users can also access the Motif Alignment and Search Tool to search sequence databases for matches to motifs encoded in several popular formats. By clicking on buttons in the MEME output, users can compare the motifs discovered in their input sequences with databases of known motifs, search sequence databases for matches to the motifs and display the motifs in various formats. This article describes the freely accessible web server and its architecture, and discusses ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance

    Discriminative motif discovery in DNA and protein sequences using the DEME algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms.</p> <p>Results</p> <p>We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins.</p> <p>Conclusion</p> <p>Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at <url>http://bioinformatics.org.au/deme/</url></p

    Profiling of target molecules of human astrocytes for selective transduction by the Adeno-associated virus variant AAV9P1

    Get PDF
    Astrocytes are among the most abundant cell types in the human central nervous system (CNS). They have critical functions in the brain, including the maintenance of neuronal homeostasis and active contribution to the formation, regulation, and maintenance of synapses and synaptic transmission. Further, astrocytes react to CNS damage with proliferation and have been shown to adapt neuronal functions in mice after the artificial expression of neurogenic factors. Astrocytic dysregulation is associated with a variety of neuropathologies, including psychological and neurodegenerative diseases. Treatment strategies for these diseases that rely on modifying the astrocytic gene or protein expression are subject of current research. The main obstacle in this pursuit is the lack of efficient and specific vectors for targeted astrocyte transduction. Adeno-associated virus (AAV) vectors are considered the gold standard for gene therapy due to their favorable biological characteristic. The variant rAAV9P1 has been described to efficiently transduce astrocytes in vivo and discriminate between astrocytes and neurons. However, the molecular base of this transduction behavior is still elusive. In this work, we have investigated the underlying molecular profile that enables efficient and selective transduction of astrocytes by rAAV9P1. We could show that rAAV9P1 transduces astrocytic cell lines more efficiently than vectors derived from its parental serotype AAV9 and with higher selectivity than vectors carrying a capsid from the well-investigated serotype AAV2. It was found that rAAV9P1 follows a transduction mechanism that is distinctly different from the HSPG-dependent, ubiquitous transduction of rAAV2. On the molecular level, rAAV9P1 engages with αv-containing integrins, likely via the RGD-sequence in the inserted P1 peptide. These integrins include αvβ8 as a central receptor and αvβ3/αvβ5 as potential redundant auxiliary receptors. Besides, rAAV9P1 transduction is dependent on classical AAV9 receptors such as terminal galactose on N-linked cell surface glycans, the 37/67 kDa laminin receptor (LamR), and the essential AAV receptor KIA00319L (AAVR). Furthermore, a genome-wide CRISPR/Cas9 screening in a human glioblastoma cell line revealed that intra-cellular pathways with astrocyte-relevance might be involved in efficient and selective transduction of astrocytes by rAAV9P1. Taken together, this work presents the detailed receptor profile of rAAV9P1, which achieves high efficiency and cell-type selectivity by combining the binding to new receptors through capsid modifications with pre-existing receptors of the parental serotype. This multi-factorial binding might pave the road for the future development of more cell-type-selective rAAV vectors, but also refining of rAAV9P1 for future in vivo and gene therapy approaches

    Discover protein sequence signatures from protein-protein interaction data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge.</p> <p>Results</p> <p>A total of 3108 sequence signatures were found, each of which was shared by a set of guest proteins interacting with one of 944 host proteins in <it>Saccharomyces cerevisiae </it>genome. Approximately 94% of these sequence signatures matched entries in InterPro member databases. We identified 84 distinct sequence signatures from the remaining 172 unknown signatures. The signature sharing information was then applied in predicting sub-cellular localization of yeast proteins and the novel signatures were used in identifying possible interacting sites.</p> <p>Conclusion</p> <p>We reported a method of PPI data mining that facilitated the discovery of novel sequence signatures using a large PPI dataset from <it>S. cerevisiae </it>genome as input. The fact that 94% of discovered signatures were known validated the ability of the approach to identify large numbers of signatures from PPI data. The significance of these discovered signatures was demonstrated by their application in predicting sub-cellular localizations and identifying potential interaction binding sites of yeast proteins.</p

    Knowledge derivation and data mining strategies for probabilistic functional integrated networks

    Get PDF
    PhDOne of the fundamental goals of systems biology is the experimental verification of the interactome: the entire complement of molecular interactions occurring in the cell. Vast amounts of high-throughput data have been produced to aid this effort. However these data are incomplete and contain high levels of both false positives and false negatives. In order to combat these limitations in data quality, computational techniques have been developed to evaluate the datasets and integrate them in a systematic fashion using graph theory. The result is an integrated network which can be analysed using a variety of network analysis techniques to draw new inferences about biological questions and to guide laboratory experiments. Individual research groups are interested in specific biological problems and, consequently, network analyses are normally performed with regard to a specific question. However, the majority of existing data integration techniques are global and do not focus on specific areas of biology. Currently this issue is addressed by using known annotation data (such as that from the Gene Ontology) to produce process-specific subnetworks. However, this approach discards useful information and is of limited use in poorly annotated areas of the interactome. Therefore, there is a need for network integration techniques that produce process-specific networks without loss of data. The work described here addresses this requirement by extending one of the most powerful integration techniques, probabilistic functional integrated networks (PFINs), to incorporate a concept of biological relevance. Initially, the available functional data for the baker’s yeast Saccharomyces cerevisiae was evaluated to identify areas of bias and specificity which could be exploited during network integration. This information was used to develop an integration technique which emphasises interactions relevant to specific biological questions, using yeast ageing as an exemplar. The integration method improves performance during network-based protein functional prediction in relation to this process. Further, the process-relevant networks complement classical network integration techniques and significantly improve network analysis in a wide range of biological processes. The method developed has been used to produce novel predictions for 505 Gene Ontology biological processes. Of these predictions 41,610 are consistent with existing computational annotations, and 906 are consistent with known expert-curated annotations. The approach significantly reduces the hypothesis space for experimental validation of genes hypothesised to be involved in the oxidative stress response. Therefore, incorporation of biological relevance into network integration can significantly improve network analysis with regard to individual biological questions
    corecore