89 research outputs found

    Qualitative reasoning of dynamic gene regulatory interactions from gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A gene regulatory relation often changes over time rather than being constant. But many gene regulatory networks available in databases or literatures are static in the sense that they are either snapshots of gene regulatory relations at a time point or union of successive gene regulations over time. Such static networks cannot represent temporal aspects of gene regulatory interactions such as the order of gene regulations or the pace of gene regulations.</p> <p>Results</p> <p>We developed a new qualitative method for representing dynamic gene regulatory relations and algorithms for identifying dynamic gene regulations from the time-series gene expression data using two types of scores. The identified gene regulatory interactions and their temporal properties are visualized as a gene regulatory network. All the algorithms have been implemented in a program called GeneNetFinder (<url>http://wilab.inha.ac.kr/genenetfinder/</url>) and tested on several gene expression data.</p> <p>Conclusions</p> <p>The dynamic nature of dynamic gene regulatory interactions can be inferred and represented qualitatively without deriving a set of differential equations describing the interactions. The approach and the program developed in our study would be useful for identifying dynamic gene regulatory interactions from the large amount of gene expression data available and for analyzing the interactions.</p

    Prediction of protein-protein interactions between viruses and human by an SVM model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species.</p> <p>Results</p> <p>We developed a new method for representing a protein sequence of variable length in a frequency vector of fixed length, which encodes the relative frequency of three consecutive amino acids of a sequence. We built a support vector machine (SVM) model to predict human proteins that interact with virus proteins. In two types of viruses, human papillomaviruses (HPV) and hepatitis C virus (HCV), our SVM model achieved an average accuracy above 80%, which is higher than that of another SVM model with a different representation scheme. Using the SVM model and Gene Ontology (GO) annotations of proteins, we predicted new interactions between virus proteins and human proteins.</p> <p>Conclusions</p> <p>Encoding the relative frequency of amino acid triplets of a protein sequence is a simple yet powerful representation method for predicting protein-protein interactions across different species. The representation method has several advantages: (1) it enables a prediction model to achieve a better performance than other representations, (2) it generates feature vectors of fixed length regardless of the sequence length, and (3) the same representation is applicable to different types of proteins.</p

    An Algorithm for Finding Functional Modules and Protein Complexes in Protein-Protein Interaction Networks

    Get PDF
    Biological processes are often performed by a group of proteins rather than by individual proteins, and proteins in a same biological group form a densely connected subgraph in a protein-protein interaction network. Therefore, finding a densely connected subgraph provides useful information to predict the function or protein complex of uncharacterized proteins in the highly connected subgraph. We have developed an efficient algorithm and program for finding cliques and near-cliques in a protein-protein interaction network. Analysis of the interaction network of yeast proteins using the algorithm demonstrates that 59% of the near-cliques identified by our algorithm have at least one function shared by all the proteins within a near-clique, and that 56% of the near-cliques show a good agreement with the experimentally determined protein complexes catalogued in MIPS

    A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetic interaction profiles are highly informative and helpful for understanding the functional linkages between genes, and therefore have been extensively exploited for annotating gene functions and dissecting specific pathway structures. However, our understanding is rather limited to the relationship between double concurrent perturbation and various higher level phenotypic changes, e.g. those in cells, tissues or organs. Modifier screens, such as synthetic genetic arrays (SGA) can help us to understand the phenotype caused by combined gene mutations. Unfortunately, exhaustive tests on all possible combined mutations in any genome are vulnerable to combinatorial explosion and are infeasible either technically or financially. Therefore, an accurate computational approach to predict genetic interaction is highly desirable, and such methods have the potential of alleviating the bottleneck on experiment design.</p> <p>Results</p> <p>In this work, we introduce a computational systems biology approach for the accurate prediction of pairwise synthetic genetic interactions (SGI). First, a high-coverage and high-precision functional gene network (FGN) is constructed by integrating protein-protein interaction (PPI), protein complex and gene expression data; then, a graph-based semi-supervised learning (SSL) classifier is utilized to identify SGI, where the topological properties of protein pairs in weighted FGN is used as input features of the classifier. We compare the proposed SSL method with the state-of-the-art supervised classifier, the support vector machines (SVM), on a benchmark dataset in <it>S. cerevisiae </it>to validate our method's ability to distinguish synthetic genetic interactions from non-interaction gene pairs. Experimental results show that the proposed method can accurately predict genetic interactions in <it>S. cerevisiae </it>(with a sensitivity of 92% and specificity of 91%). Noticeably, the SSL method is more efficient than SVM, especially for very small training sets and large test sets.</p> <p>Conclusions</p> <p>We developed a graph-based SSL classifier for predicting the SGI. The classifier employs topological properties of weighted FGN as input features and simultaneously employs information induced from labelled and unlabelled data. Our analysis indicates that the topological properties of weighted FGN can be employed to accurately predict SGI. Also, the graph-based SSL method outperforms the traditional standard supervised approach, especially when used with small training sets. The proposed method can alleviate experimental burden of exhaustive test and provide a useful guide for the biologist in narrowing down the candidate gene pairs with SGI. The data and source code implementing the method are available from the website: <url>http://home.ustc.edu.cn/~yzh33108/GeneticInterPred.htm</url></p

    An ontology-based search engine for protein-protein interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database.</p> <p>Results</p> <p>We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions.</p> <p>Conclusion</p> <p>Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.</p

    Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-protein interactions and other molecular interactions as well, taking all non-positive interactions as negative interactions produces too many negative interactions for the positive interactions. Random selection from non-positive interactions is unsuitable, since the selected data may not reflect the original distribution of data.</p> <p>Results</p> <p>We developed a bootstrapping algorithm for generating a negative data set of arbitrary size from protein-protein interaction data. We also developed an efficient boosting algorithm for finding interacting motif pairs in human and virus proteins. The boosting algorithm showed the best performance (84.4% sensitivity and 75.9% specificity) with balanced positive and negative data sets. The boosting algorithm was also used to find potential motif pairs in complexes of human and virus proteins, for which structural data was not used to train the algorithm. Interacting motif pairs common to multiple folds of structural data for the complexes were proven to be statistically significant. The data set for interactions between human and virus proteins was extracted from BOND and is available at <url>http://virus.hpid.org/interactions.aspx</url>. The complexes of human and virus proteins were extracted from PDB and their identifiers are available at <url>http://virus.hpid.org/PDB_IDs.html</url>.</p> <p>Conclusion</p> <p>When the positive and negative training data sets are unbalanced, the result via the prediction model tends to be biased. Bootstrapping is effective for generating a negative data set, for which the size and distribution are easily controlled. Our boosting algorithm could efficiently predict interacting motif pairs from protein interaction and sequence data, which was trained with the balanced data sets generated via the bootstrapping method.</p

    Compositional Modeling For Spatial Problems

    No full text
    OF THE DISSERTATION Compositional Modeling for Spatial Problems by Kyungsook Han Dissertation Director: Professor Andrew Gelsey Solving a problem about a complex physical system generally involves the creation and execution of a model needed to reason about the problem. Effective problem solving about a physical system requires the use of an adequate model, the creation of which in turn depends on the types of knowledge available for the physical system and their representation. Such a model is normally created by the person studying the system, but a hand-crafted model is often error-prone. Modifying a hand-crafted model to solve a similar problem about other physical systems is also difficult, and may take more time than building a new model for the systems. My research has two main goals: (1) automating the construction and execution of models of physical systems for spatial problems, where objects are related to each other either geometrically or topologically to satisfy a set of c..
    corecore