163 research outputs found

    An Introductory Guide to Aligning Networks Using SANA, the Simulated Annealing Network Aligner.

    Get PDF
    Sequence alignment has had an enormous impact on our understanding of biology, evolution, and disease. The alignment of biological networks holds similar promise. Biological networks generally model interactions between biomolecules such as proteins, genes, metabolites, or mRNAs. There is strong evidence that the network topology-the "structure" of the network-is correlated with the functions performed, so that network topology can be used to help predict or understand function. However, unlike sequence comparison and alignment-which is an essentially solved problem-network comparison and alignment is an NP-complete problem for which heuristic algorithms must be used.Here we introduce SANA, the Simulated Annealing Network Aligner. SANA is one of many algorithms proposed for the arena of biological network alignment. In the context of global network alignment, SANA stands out for its speed, memory efficiency, ease-of-use, and flexibility in the arena of producing alignments between two or more networks. SANA produces better alignments in minutes on a laptop than most other algorithms can produce in hours or days of CPU time on large server-class machines. We walk the user through how to use SANA for several types of biomolecular networks

    Fifteen years SIB Swiss Institute of Bioinformatics: life science databases, tools and support.

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) was created in 1998 as an institution to foster excellence in bioinformatics. It is renowned worldwide for its databases and software tools, such as UniProtKB/Swiss-Prot, PROSITE, SWISS-MODEL, STRING, etc, that are all accessible on ExPASy.org, SIB's Bioinformatics Resource Portal. This article provides an overview of the scientific and training resources SIB has consistently been offering to the life science community for more than 15 years

    The Binary Protein Interactome of Treponema pallidum – The Syphilis Spirochete

    Get PDF
    Protein interaction networks shed light on the global organization of proteomes but can also place individual proteins into a functional context. If we know the function of bacterial proteins we will be able to understand how these species have adapted to diverse environments including many extreme habitats. Here we present the protein interaction network for the syphilis spirochete Treponema pallidum which encodes 1,039 proteins, 726 (or 70%) of which interact via 3,649 interactions as revealed by systematic yeast two-hybrid screens. A high-confidence subset of 991 interactions links 576 proteins. To derive further biological insights from our data, we constructed an integrated network of proteins involved in DNA metabolism. Combining our data with additional evidences, we provide improved annotations for at least 18 proteins (including TP0004, TP0050, and TP0183 which are suggested to be involved in DNA metabolism). We estimate that this “minimal” bacterium contains on the order of 3,000 protein interactions. Profiles of functional interconnections indicate that bacterial proteins interact more promiscuously than eukaryotic proteins, reflecting the non-compartmentalized structure of the bacterial cell. Using our high-confidence interactions, we also predict 417,329 homologous interactions (“interologs”) for 372 completely sequenced genomes and provide evidence that at least one third of them can be experimentally confirmed

    A Covering Method for Detecting Genetic Associations between Rare Variants and Common Phenotypes

    Get PDF
    Genome wide association (GWA) studies, which test for association between common genetic markers and a disease phenotype, have shown varying degrees of success. While many factors could potentially confound GWA studies, we focus on the possibility that multiple, rare variants (RVs) may act in concert to influence disease etiology. Here, we describe an algorithm for RV analysis, RARECOVER. The algorithm combines a disparate collection of RVs with low effect and modest penetrance. Further, it does not require the rare variants be adjacent in location. Extensive simulations over a range of assumed penetrance and population attributable risk (PAR) values illustrate the power of our approach over other published methods, including the collapsing and weighted-collapsing strategies. To showcase the method, we apply RARECOVER to re-sequencing data from a cohort of 289 individuals at the extremes of Body Mass Index distribution (NCT00263042). Individual samples were re-sequenced at two genes, FAAH and MGLL, known to be involved in endocannabinoid metabolism (187Kbp for 148 obese and 150 controls). The RARECOVER analysis identifies exactly one significantly associated region in each gene, each about 5 Kbp in the upstream regulatory regions. The data suggests that the RVs help disrupt the expression of the two genes, leading to lowered metabolism of the corresponding cannabinoids. Overall, our results point to the power of including RVs in measuring genetic associations.National Science Foundation (U.S.) (grant (IIS-0810905)National Institutes of Health (U.S.) (U19 AG023122-05)National Institutes of Health (U.S.) (R01 MH078151-03)Louis & Harold Price FoundationNational Institutes of Health (U.S.) (N01 MH22005)National Institutes of Health (U.S.) (U01-DA024417-01)National Institutes of Health (U.S.) (P50 MH081755-01)National Institutes of Health (U.S.) (R01 AG030474-02)National Institutes of Health (U.S.) (N01 MH022005)National Institutes of Health (U.S.) (R01 HL089655-02)National Institutes of Health (U.S.) (R01 MH080134-03)National Institutes of Health (U.S.) (U54 CA143906-01)National Institutes of Health (U.S.) (UL1 RR025774-03)Scripps Genomic Medicine ProgramNational Human Genome Research Institute (U.S.) (Grant Number T32 HG002295

    MetaMine – A tool to detect and analyse gene patterns in their environmental context

    Get PDF
    Background Modern sequencing technologies allow rapid sequencing and bioinformatic analysis of genomes and metagenomes. With every new sequencing project a vast number of new proteins become available with many genes remaining functionally unclassified based on evidences from sequence similarities alone. Extending similarity searches with gene pattern approaches, defined as genes sharing a distinct genomic neighbourhood, have shown to significantly improve the number of functional assignments. Further functional evidences can be gained by correlating these gene patterns with prevailing environmental parameters. MetaMine was developed to approach the large pool of unclassified proteins by searching for recurrent gene patterns across habitats based on key genes. Results MetaMine is an interactive data mining tool which enables the detection of gene patterns in an environmental context. The gene pattern search starts with a user defined environmentally interesting key gene. With this gene a BLAST search is carried out against the Microbial Ecological Genomics DataBase (MEGDB) containing marine genomic and metagenomic sequences. This is followed by the determination of all neighbouring genes within a given distance and a search for functionally equivalent genes. In the final step a set of common genes present in a defined number of distinct genomes is determined. The gene patterns found are associated with their individual pattern instances describing gene order and directions. They are presented together with information about the sample and the habitat. MetaMine is implemented in Java and provided as a client/server application with a user-friendly graphical user interface. The system was evaluated with environmentally relevant genes related to the methane-cycle and carbon monoxide oxidation. Conclusion MetaMine offers a targeted, semi-automatic search for gene patterns based on expert input. The graphical user interface of MetaMine provides a user-friendly overview of the computed gene patterns for further inspection in an ecological context. Prevailing biological processes associated with a key gene can be used to infer new annotations and shape hypotheses to guide further analyses. The use-cases demonstrate that meaningful gene patterns can be quickly detected using MetaMine

    Correlated Mutations: A Hallmark of Phenotypic Amino Acid Substitutions

    Get PDF
    Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/

    Protein-Protein Interactions of Tandem Affinity Purified Protein Kinases from Rice

    Get PDF
    Eighty-eight rice (Oryza sativa) cDNAs encoding rice leaf expressed protein kinases (PKs) were fused to a Tandem Affinity Purification tag (TAP-tag) and expressed in transgenic rice plants. The TAP-tagged PKs and interacting proteins were purified from the T1 progeny of the transgenic rice plants and identified by tandem mass spectrometry. Forty-five TAP-tagged PKs were recovered in this study and thirteen of these were found to interact with other rice proteins with a high probability score. In vivo phosphorylated sites were found for three of the PKs. A comparison of the TAP-tagged data from a combined analysis of 129 TAP-tagged rice protein kinases with a concurrent screen using yeast two hybrid methods identified an evolutionarily new rice protein that interacts with the well conserved cell division cycle 2 (CDC2) protein complex

    Genetic Co-Occurrence Network across Sequenced Microbes

    Get PDF
    The phenotype of any organism on earth is, in large part, the consequence of interplay between numerous gene products encoded in the genome, and such interplay between gene products affects the evolutionary fate of the genome itself through the resulting phenotype. In this regard, contemporary genomes can be used as molecular records that reveal associations of various genes working in their natural lifestyles. By analyzing thousands of orthologs across ~600 bacterial species, we constructed a map of gene-gene co-occurrence across much of the sequenced biome. If genes preferentially co-occur in the same organisms, they were called herein correlogs; in the opposite case, called anti-correlogs. To quantify correlogy and anti-correlogy, we alleviated the contribution of indirect correlations between genes by adapting ideas developed for reverse engineering of transcriptional regulatory networks. Resultant correlogous associations are highly enriched for physically interacting proteins and for co-expressed transcripts, clearly differentiating a subgroup of functionally-obligatory protein interactions from conditional or transient interactions. Other biochemical and phylogenetic properties were also found to be reflected in correlogous and anti-correlogous relationships. Additionally, our study elucidates the global organization of the gene association map, in which various modules of correlogous genes are strikingly interconnected by anti-correlogous crosstalk between the modules. We then demonstrate the effectiveness of such associations along different domains of life and environmental microbial communities. These phylogenetic profiling approaches infer functional coupling of genes regardless of mechanistic details, and may be useful to guide exogenous gene import in synthetic biology.Comment: Supporting information is available at PLoS Computational Biolog

    False positive reduction in protein-protein interaction predictions using gene ontology annotations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated.</p> <p>Results</p> <p>Gene Ontology (GO) annotations were used to reduce false positive protein-protein interactions (PPI) pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets) in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The '<it>strength</it>', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the <it>strength </it>varies between two and ten-fold of randomly removing protein pairs from the datasets.</p> <p>Conclusion</p> <p>Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially remove false predicted PPI pairs. Removal of false positives from predicted datasets increases the true positive fractions of the datasets and improves the robustness of predicted pairs as compared to random protein pairing, and eventually results in better overlap with experimental results.</p
    corecore