47 research outputs found

    Efficiency of the immunome protein interaction network increases during evolution

    Get PDF
    Details of the mechanisms and selection pressures that shape the emergence and development of complex biological systems, such as the human immune system, are poorly understood. A recent definition of a reference set of proteins essential for the human immunome, combined with information about protein interaction networks for these proteins, facilitates evolutionary study of this biological machinery

    ImmTree: Database of evolutionary relationships of genes and proteins in the human immune system

    Get PDF
    BACKGROUND: The immune system, which is a complex machinery, is based on the highly coordinated expression of a wide array of genes and proteins. The evolutionary history of the human immune system is not well characterised. Although several studies related to the development and evolution of immunological processes have been published, a full-scale genome-based analysis is still missing. A database focused on the evolutionary relationships of immune related genes would contribute to and facilitate research on immunology and evolutionary biology. RESULTS: An Internet resource called ImmTree was constructed for studying the evolution and evolutionary trees of the human immune system. ImmTree contains information about orthologs in 80 species collected from the HomoloGene, OrthoMCL and EGO databases. In addition to phylogenetic trees, the service provides data for the comparison of human-mouse ortholog pairs, including synonymous and non-synonymous mutation rates, Z values, and K(a)/K(s )quotients. A versatile search engine allows complex queries from the database. Currently, data is available for 847 human immune system related genes and proteins. CONCLUSION: ImmTree provides a unique data set of genes and proteins from the human immune system, their phylogenetics, and information for comparisons of human-mouse ortholog pairs, synonymous and non-synonymous mutation rates, as well as other statistical information

    Phylogeny of Tec Family Kinases: Identification of a Pre-Metazoan Origin of Btk, Bmx, Itk, Tec, Txk and the Btk Regulator SH3BP5

    Get PDF
    It is generally considered mammals and birds have five Tec family kinases (TFKs): Btk, Bmx (also known as Etk), Itk, Tec, and Txk (also known as Rlk). Here, we discuss the domains and their functions and regulation in TFKs. Over the last few years, a large number of genomes from various phyla have been sequenced making it possible to study evolutionary relationships at the molecular and sequence level. Using bioinformatics tools, we for the first time demonstrate that a TFK ancestor exists in the unicellular choanoflagellate Monosiga brevicollis, which is the closest known relative to metazoans with a sequenced genome. The analysis of the genomes for sponges, insects, hagfish, and frogs suggests that these species encode a single TFK. The insect form has a divergent and unique N-terminal region. Duplications generating the five members took place prior to the emergence of vertebrates. Fishes have two or three forms and the platypus, Ornithorhynchus anatinus, has four (lacks Txk). Thus, not all mammals have all five TFKs. The single identified TFK in frogs is an ortholog of Tec. Bmx seems to be unique to mammals and birds. SH3BP5 is a negative regulator of Btk. It is conserved in choanoflagellates and interestingly exists also in nematodes, which do not express TFKs, suggesting a broader function in addition to Btk regulation. The related SH3BP5-like protein is not found in Nematodes

    Are proposed early genetic codes capable of encoding viable proteins?

    Get PDF
    Proteins are elaborate biopolymers balancing between contradicting intrinsic propensities to fold, aggregate or remain disordered. Assessing their primary structural preferences observable without evolutionary optimization has been reinforced by the recent identification of de novo proteins that have emerged from previously non-coding sequences. In this paper we investigate structural preferences of hypothetical proteins translated from random DNA segments using the standard genetic code and three of its proposed evolutionarily predecessor models encoding 10, 6 and 4 amino acids, respectively. Our only main assumption is that the disorder, aggregation and transmembrane helix predictions used are able to reflect the differences in the trends of the protein sets investigated. We found that the 10-residue code encodes proteins that resemble modern proteins in their predicted structural properties. All of the investigated early genetic codes give rise to proteins with enhanced disorder and diminished aggregation propensities. Our results suggest that an ancestral genetic code similar to the proposed 10-residue one is capable of encoding functionally diverse proteins but these might have existed under conditions different from today's common physiological ones. The existence of a protein functional repertoire for the investigated earlier stages which is quite distinct as it is today can be deduced from the presented results

    Identification of core T cell network based on immunome interactome

    Get PDF
    Background Data-driven studies on the dynamics of reconstructed protein-protein interaction (PPI) networks facilitate investigation and identification of proteins important for particular processes or diseases and reduces time and costs of experimental verification. Modeling the dynamics of very large PPI networks is computationally costly. Results To circumvent this problem, we created a link-weighted human immunome interactome and performed filtering. We reconstructed the immunome interactome and weighed the links using jackknife gene expression correlation of integrated, time course gene expression data. Statistical significance of the links was computed using the Global Statistical Significance (GloSS) filtering algorithm. P-values from GloSS were computed for the integrated, time course gene expression data. We filtered the immunome interactome to identify core components of the T cell PPI network (TPPIN). The interconnectedness of the major pathways for T cell survival and response, including the T cell receptor, MAPK and JAK-STAT pathways, are maintained in the TPPIN network. The obtained TPPIN network is supported both by Gene Ontology term enrichment analysis along with study of essential genes enrichment. Conclusions By integrating gene expression data to the immunome interactome and using a weighted network filtering method, we identified the T cell PPI immune response network. This network reveals the most central and crucial network in T cells. The approach is general and applicable to any dataset that contains sufficient information.BioMed Central open acces

    DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants

    Get PDF
    DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21 061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically

    Bioinformatic analysis of beta carbonic anhydrase sequences from protozoans and metazoans

    Get PDF
    Background Despite the high prevalence of parasitic infections, and their impact on global health and economy, the number of drugs available to treat them is extremely limited. As a result, the potential consequences of large-scale resistance to any existing drugs are a major concern. A number of recent investigations have focused on the effects of potential chemical inhibitors on bacterial and fungal carbonic anhydrases. Among the five classes of carbonic anhydrases (alpha, beta, gamma, delta and zeta), beta carbonic anhydrases have been reported in most species of bacteria, yeasts, algae, plants, and particular invertebrates (nematodes and insects). To date, there has been a lack of knowledge on the expression and molecular structure of beta carbonic anhydrases in metazoan (nematodes and arthropods) and protozoan species. Methods Here, the identification of novel beta carbonic anhydrases was based on the presence of the highly-conserved amino acid sequence patterns of the active site. A phylogenetic tree was constructed based on codon-aligned DNA sequences. Subcellular localization prediction for each identified invertebrate beta carbonic anhydrase was performed using the TargetP webserver. Results We verified a total of 75 beta carbonic anhydrase sequences in metazoan and protozoan species by proteome-wide searches and multiple sequence alignment. Of these, 52 were novel, and contained highly conserved amino acid residues, which are inferred to form the active site in beta carbonic anhydrases. Mitochondrial targeting peptide analysis revealed that 31 enzymes are predicted with mitochondrial localization; one was predicted to be a secretory enzyme, and the other 43 were predicted to have other undefined cellular localizations. Conclusions These investigations identified 75 beta carbonic anhydrases in metazoan and protozoan species, and among them there were 52 novel sequences that were not previously annotated as beta carbonic anhydrases. Our results will not only change the current information in proteomics and genomics databases, but will also suggest novel targets for drugs against parasites.BioMed Central open acces

    PseudoGeneQuest – Service for identification of different pseudogene types in the human genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pseudogenes, nonfunctional copies of genes, evolve fast due the lack of evolutionary pressures and thus appear in several different forms. PseudoGeneQuest is an online tool to search the human genome for a given query sequence and to identify different types of pseudogenes as well as novel genes and gene fragments.</p> <p>Description</p> <p>The service can detect pseudogenes, that have arisen either by retrotransposition or segmental genome duplication, many of which are not listed in the public pseudogene databases. The service has a user-friendly web interface and uses a powerful computer cluster in order to perform parallel searches and provide relatively fast runtimes despite exhaustive database searches and analyses.</p> <p>Conclusion</p> <p>PseudoGeneQuest is a versatile tool for detecting novel pseudogene candidates from the human genome. The service searches human genome sequences for five types of pseudogenes and provides an output that allows easy further analysis of observations. In addition to the result file the system provides visualization of the results linked to Ensembl Genome Browser. PseudoGeneQuest service is freely available.</p

    Identification of candidate disease genes by integrating Gene Ontologies and protein-interaction networks: case study of primary immunodeficiencies

    Get PDF
    Disease gene identification is still a challenge despite modern high-throughput methods. Many diseases are very rare or lethal and thus cannot be investigated with traditional methods. Several in silico methods have been developed but they have some limitations. We introduce a new method that combines information about protein-interaction network properties and Gene Ontology terms. Genes with high-calculated network scores and statistically significant gene ontology terms based on known diseases are prioritized as candidate genes. The method was applied to identify novel primary immunodeficiency-related genes, 26 of which were found. The investigation uses the protein-interaction network for all essential immunome human genes available in the Immunome Knowledge Base and an analysis of their enriched gene ontology annotations. The identified disease gene candidates are mainly involved in cellular signaling including receptors, protein kinases and adaptor and binding proteins as well as enzymes. The method can be generalized for any disease group with sufficient information

    Bioinformatic analysis of beta carbonic anhydrase sequences from protozoans and metazoans

    Get PDF
    BACKGROUND: Despite the high prevalence of parasitic infections, and their impact on global health and economy, the number of drugs available to treat them is extremely limited. As a result, the potential consequences of large-scale resistance to any existing drugs are a major concern. A number of recent investigations have focused on the effects of potential chemical inhibitors on bacterial and fungal carbonic anhydrases. Among the five classes of carbonic anhydrases (alpha, beta, gamma, delta and zeta), beta carbonic anhydrases have been reported in most species of bacteria, yeasts, algae, plants, and particular invertebrates (nematodes and insects). To date, there has been a lack of knowledge on the expression and molecular structure of beta carbonic anhydrases in metazoan (nematodes and arthropods) and protozoan species. METHODS: Here, the identification of novel beta carbonic anhydrases was based on the presence of the highly-conserved amino acid sequence patterns of the active site. A phylogenetic tree was constructed based on codon-aligned DNA sequences. Subcellular localization prediction for each identified invertebrate beta carbonic anhydrase was performed using the TargetP webserver. RESULTS: We verified a total of 75 beta carbonic anhydrase sequences in metazoan and protozoan species by proteome-wide searches and multiple sequence alignment. Of these, 52 were novel, and contained highly conserved amino acid residues, which are inferred to form the active site in beta carbonic anhydrases. Mitochondrial targeting peptide analysis revealed that 31 enzymes are predicted with mitochondrial localization; one was predicted to be a secretory enzyme, and the other 43 were predicted to have other undefined cellular localizations. CONCLUSIONS: These investigations identified 75 beta carbonic anhydrases in metazoan and protozoan species, and among them there were 52 novel sequences that were not previously annotated as beta carbonic anhydrases. Our results will not only change the current information in proteomics and genomics databases, but will also suggest novel targets for drugs against parasites.</p
    corecore