96 research outputs found

    PhyloPat: phylogenetic pattern analysis of eukaryotic genes

    Get PDF
    BACKGROUND: Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. DESCRIPTION: PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included. CONCLUSION: PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic lineages will further increase with the addition of newly found orthologous relationships within each new Ensembl release

    A meta-analysis reveals the commonalities and differences in Arabidopsis thaliana response to different viral pathogens

    Get PDF
    Understanding the mechanisms by which plants trigger host defenses in response to viruses has been a challenging problem owing to the multiplicity of factors and complexity of interactions involved. The advent of genomic techniques, however, has opened the possibility to grasp a global picture of the interaction. Here, we used Arabidopsis thaliana to identify and compare genes that are differentially regulated upon infection with seven distinct (+)ssRNA and one ssDNA plant viruses. In the first approach, we established lists of genes differentially affected by each virus and compared their involvement in biological functions and metabolic processes. We found that phylogenetically related viruses significantly alter the expression of similar genes and that viruses naturally infecting Brassicaceae display a greater overlap in the plant response. In the second approach, virus-regulated genes were contextualized using models of transcriptional and protein-protein interaction networks of A. thaliana. Our results confirm that host cells undergo significant reprogramming of their transcriptome during infection, which is possibly a central requirement for the mounting of host defenses. We uncovered a general mode of action in which perturbations preferentially affect genes that are highly connected, central and organized in modules. Β© 2012 Rodrigo et al.This work was supported by the Spanish Ministerio de Ciencia e Innovacion (MICINN) grants BFU2009-06993 (S. F. E.) and BIO2006-13107 (C. L.) and by Generalitat Valenciana grant PROMETEO2010/016 (S. F. E.). G. R. is supported by a graduate fellowship from the Generalitat Valenciana (BFPI2007-160) and J.C. by a contract from MICINN grant TIN2006-12860. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Rodrigo Tarrega, G.; Carrera Montesinos, J.; Ruiz-Ferrer, V.; Del Toro, F.; Llave, C.; Voinnet, O.; Elena Fito, SF. (2012). A meta-analysis reveals the commonalities and differences in Arabidopsis thaliana response to different viral pathogens. PLoS ONE. 7(7):40526-40526. https://doi.org/10.1371/journal.pone.0040526S405264052677Peng, X., Chan, E. Y., Li, Y., Diamond, D. L., Korth, M. J., & Katze, M. G. (2009). Virus–host interactions: from systems biology to translational research. Current Opinion in Microbiology, 12(4), 432-438. doi:10.1016/j.mib.2009.06.003Dodds, P. N., & Rathjen, J. P. (2010). Plant immunity: towards an integrated view of plant–pathogen interactions. Nature Reviews Genetics, 11(8), 539-548. doi:10.1038/nrg2812Maule, A., Leh, V., & Lederer, C. (2002). The dialogue between viruses and hosts in compatible interactions. Current Opinion in Plant Biology, 5(4), 279-284. doi:10.1016/s1369-5266(02)00272-8Whitham, S. A., Quan, S., Chang, H.-S., Cooper, B., Estes, B., Zhu, T., … Hou, Y.-M. (2003). Diverse RNA viruses elicit the expression of common sets of genes in susceptibleArabidopsis thalianaplants. The Plant Journal, 33(2), 271-283. doi:10.1046/j.1365-313x.2003.01625.xBailer, S., & Haas, J. (2009). Connecting viral with cellular interactomes. Current Opinion in Microbiology, 12(4), 453-459. doi:10.1016/j.mib.2009.06.004Whitham, S. A., Yang, C., & Goodin, M. M. (2006). Global Impact: Elucidating Plant Responses to Viral Infection. Molecular Plant-Microbe Interactions, 19(11), 1207-1215. doi:10.1094/mpmi-19-1207MacPherson, J. I., Dickerson, J. E., Pinney, J. W., & Robertson, D. L. (2010). Patterns of HIV-1 Protein Interaction Identify Perturbed Host-Cellular Subsystems. PLoS Computational Biology, 6(7), e1000863. doi:10.1371/journal.pcbi.1000863Jenner, R. G., & Young, R. A. (2005). Insights into host responses against pathogens from transcriptional profiling. Nature Reviews Microbiology, 3(4), 281-294. doi:10.1038/nrmicro1126Andeweg, A. C., Haagmans, B. L., & Osterhaus, A. D. (2008). Virogenomics: the virus–host interaction revisited. Current Opinion in Microbiology, 11(5), 461-466. doi:10.1016/j.mib.2008.09.010Elena, S. F., Carrera, J., & Rodrigo, G. (2011). A systems biology approach to the evolution of plant–virus interactions. Current Opinion in Plant Biology, 14(4), 372-377. doi:10.1016/j.pbi.2011.03.013Tan, S.-L., Ganji, G., Paeper, B., Proll, S., & Katze, M. G. (2007). Systems biology and the host response to viral infection. Nature Biotechnology, 25(12), 1383-1389. doi:10.1038/nbt1207-1383De la Fuente, A. (2010). From β€˜differential expression’ to β€˜differential networking’ – identification of dysfunctional regulatory networks in diseases. Trends in Genetics, 26(7), 326-333. doi:10.1016/j.tig.2010.05.001Albert, R. (2005). Scale-free networks in cell biology. Journal of Cell Science, 118(21), 4947-4957. doi:10.1242/jcs.02714Yu, H., Braun, P., Yildirim, M. A., Lemmens, I., Venkatesan, K., Sahalie, J., … Vidal, M. (2008). High-Quality Binary Protein Interaction Map of the Yeast Interactome Network. Science, 322(5898), 104-110. doi:10.1126/science.1158684BarabΓ‘si, A.-L., & Oltvai, Z. N. (2004). Network biology: understanding the cell’s functional organization. Nature Reviews Genetics, 5(2), 101-113. doi:10.1038/nrg1272Albert, R., Jeong, H., & BarabΓ‘si, A.-L. (2000). Error and attack tolerance of complex networks. Nature, 406(6794), 378-382. doi:10.1038/35019019Mukhtar, M. S., Carvunis, A.-R., Dreze, M., Epple, P., Steinbrenner, J., … Moore, J. (2011). Independently Evolved Virulence Effectors Converge onto Hubs in a Plant Immune System Network. Science, 333(6042), 596-601. doi:10.1126/science.1203659Calderwood, M. A., Venkatesan, K., Xing, L., Chase, M. R., Vazquez, A., Holthaus, A. M., … Johannsen, E. (2007). Epstein-Barr virus and virus human protein interaction maps. Proceedings of the National Academy of Sciences, 104(18), 7606-7611. doi:10.1073/pnas.0702332104De Chassey, B., Navratil, V., Tafforeau, L., Hiet, M. S., Aublin‐Gex, A., AgauguΓ©, S., … Lotteau, V. (2008). Hepatitis C virus infection protein network. Molecular Systems Biology, 4(1), 230. doi:10.1038/msb.2008.66Shapira, S. D., Gat-Viks, I., Shum, B. O. V., Dricot, A., de Grace, M. M., Wu, L., … Hacohen, N. (2009). A Physical and Regulatory Map of Host-Influenza Interactions Reveals Pathways in H1N1 Infection. Cell, 139(7), 1255-1267. doi:10.1016/j.cell.2009.12.018Dyer, M. D., Murali, T. M., & Sobral, B. W. (2008). The Landscape of Human Proteins Interacting with Viruses and Other Pathogens. PLoS Pathogens, 4(2), e32. doi:10.1371/journal.ppat.0040032Golem, S., & Culver, J. N. (2003). Tobacco mosaic virusInduced Alterations in the Gene Expression Profile ofArabidopsis thaliana. Molecular Plant-Microbe Interactions, 16(8), 681-688. doi:10.1094/mpmi.2003.16.8.681Espinoza, C., Medina, C., Somerville, S., & Arce-Johnson, P. (2007). Senescence-associated genes induced during compatible viral interactions with grapevine and Arabidopsis. Journal of Experimental Botany, 58(12), 3197-3212. doi:10.1093/jxb/erm165Yang, C., Guo, R., Jie, F., Nettleton, D., Peng, J., Carr, T., … Whitham, S. A. (2007). Spatial Analysis ofArabidopsis thalianaGene Expression in Response toTurnip mosaic virusInfection. Molecular Plant-Microbe Interactions, 20(4), 358-370. doi:10.1094/mpmi-20-4-0358Agudelo-Romero, P., Carbonell, P., de la Iglesia, F., Carrera, J., Rodrigo, G., Jaramillo, A., … Elena, S. F. (2008). Changes in the gene expression profile of Arabidopsis thaliana after infection with Tobacco etch virus. Virology Journal, 5(1), 92. doi:10.1186/1743-422x-5-92Agudelo-Romero, P., Carbonell, P., Perez-Amador, M. A., & Elena, S. F. (2008). Virus Adaptation by Manipulation of Host’s Gene Expression. PLoS ONE, 3(6), e2397. doi:10.1371/journal.pone.0002397Ascencio-IbÑñez, J. T., Sozzani, R., Lee, T.-J., Chu, T.-M., Wolfinger, R. D., Cella, R., & Hanley-Bowdoin, L. (2008). Global Analysis of Arabidopsis Gene Expression Uncovers a Complex Array of Changes Impacting Pathogen Response and Cell Cycle during Geminivirus Infection. Plant Physiology, 148(1), 436-454. doi:10.1104/pp.108.121038Babu, M., Griffiths, J. S., Huang, T.-S., & Wang, A. (2008). Altered gene expression changes in Arabidopsis leaf tissues and protoplasts in response to Plum pox virus infection. BMC Genomics, 9(1), 325. doi:10.1186/1471-2164-9-325De Vienne, D. M., Giraud, T., & Martin, O. C. (2007). A congruence index for testing topological similarity between trees. Bioinformatics, 23(23), 3119-3124. doi:10.1093/bioinformatics/btm500Wise, R. P., Moscou, M. J., Bogdanove, A. J., & Whitham, S. A. (2007). Transcript Profiling in Host–Pathogen Interactions. Annual Review of Phytopathology, 45(1), 329-369. doi:10.1146/annurev.phyto.45.011107.143944Handford, M. G., & Carr, J. P. (2007). A defect in carbohydrate metabolism ameliorates symptom severity in virus-infected Arabidopsis thaliana. Journal of General Virology, 88(1), 337-341. doi:10.1099/vir.0.82376-0Hou, B., Lim, E.-K., Higgins, G. S., & Bowles, D. J. (2004). N-Glucosylation of Cytokinins by Glycosyltransferases ofArabidopsis thaliana. Journal of Biological Chemistry, 279(46), 47822-47832. doi:10.1074/jbc.m409569200Schwender, J., Goffman, F., Ohlrogge, J. B., & Shachar-Hill, Y. (2004). Rubisco without the Calvin cycle improves the carbon efficiency of developing green seeds. Nature, 432(7018), 779-782. doi:10.1038/nature03145PagΓ‘n, I., Alonso-Blanco, C., & GarcΓ­a-Arenal, F. (2008). Host Responses in Life-History Traits and Tolerance to Virus Infection in Arabidopsis thaliana. PLoS Pathogens, 4(8), e1000124. doi:10.1371/journal.ppat.1000124Carrera, J., Rodrigo, G., Jaramillo, A., & Elena, S. F. (2009). Reverse-engineering the Arabidopsis thaliana transcriptional network under changing environmental conditions. Genome Biology, 10(9), R96. doi:10.1186/gb-2009-10-9-r96Geisler-Lee, J., O’Toole, N., Ammar, R., Provart, N. J., Millar, A. H., & Geisler, M. (2007). A Predicted Interactome for Arabidopsis. Plant Physiology, 145(2), 317-329. doi:10.1104/pp.107.103465Ma, S., Gong, Q., & Bohnert, H. J. (2007). An Arabidopsis gene network based on the graphical Gaussian model. Genome Research, 17(11), 1614-1625. doi:10.1101/gr.6911207Yamada, T., & Bork, P. (2009). Evolution of biomolecular networks β€” lessons from metabolic and protein interactions. Nature Reviews Molecular Cell Biology, 10(11), 791-803. doi:10.1038/nrm2787Humphries, M. D., & Gurney, K. (2008). Network β€˜Small-World-Ness’: A Quantitative Method for Determining Canonical Network Equivalence. PLoS ONE, 3(4), e0002051. doi:10.1371/journal.pone.0002051Stumpf, M. P. H., & Ingram, P. J. (2005). Probability models for degree distributions of protein interaction networks. Europhysics Letters (EPL), 71(1), 152-158. doi:10.1209/epl/i2004-10531-8Khanin, R., & Wit, E. (2006). How Scale-Free Are Biological Networks. Journal of Computational Biology, 13(3), 810-818. doi:10.1089/cmb.2006.13.810Daudin, J.-J., Picard, F., & Robin, S. (2007). A mixture model for random graphs. Statistics and Computing, 18(2), 173-183. doi:10.1007/s11222-007-9046-7Uetz, P. (2006). Herpesviral Protein Networks and Their Interaction with the Human Proteome. Science, 311(5758), 239-242. doi:10.1126/science.1116804Choi, I.-R., Stenger, D. C., & French, R. (2000). Multiple Interactions among Proteins Encoded by the Mite-Transmitted Wheat Streak Mosaic Tritimovirus. Virology, 267(2), 185-198. doi:10.1006/viro.1999.0117Guo, D., Saarma, M., RajamΓ€ki, M.-L., & Valkonen, J. P. T. (2001). Towards a protein interaction map of potyviruses: protein interaction matrixes of two potyviruses based on the yeast two-hybrid system. Journal of General Virology, 82(4), 935-939. doi:10.1099/0022-1317-82-4-935Lin, L., Shi, Y., Luo, Z., Lu, Y., Zheng, H., Yan, F., … Wu, Y. (2009). Protein–protein interactions in two potyviruses using the yeast two-hybrid system. Virus Research, 142(1-2), 36-40. doi:10.1016/j.virusres.2009.01.006Shen, W., Wang, M., Yan, P., Gao, L., & Zhou, P. (2010). Protein interaction matrix of Papaya ringspot virus type Pβ€―based on aβ€―yeast two-hybrid system. Acta Virologica, 54(1), 49-54. doi:10.4149/av_2010_01_49Redner, S. (2008). Teasing out the missing links. Nature, 453(7191), 47-48. doi:10.1038/453047aIrizarry, R. A. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4(2), 249-264. doi:10.1093/biostatistics/4.2.249Smyth, G. K. (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), 1-25. doi:10.2202/1544-6115.1027Allemeersch, J., Durinck, S., Vanderhaeghen, R., Alard, P., Maes, R., Seeuws, K., … Kuiper, M. T. R. (2005). Benchmarking the CATMA Microarray. A Novel Tool forArabidopsis Transcriptome Analysis. Plant Physiology, 137(2), 588-601. doi:10.1104/pp.104.051300Cleveland, W. S. (1979). Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74(368), 829-836. doi:10.1080/01621459.1979.10481038Tarraga, J., Medina, I., Carbonell, J., Huerta-Cepas, J., Minguez, P., Alloza, E., … Dopazo, J. (2008). GEPAS, a web-based tool for microarray data analysis and interpretation. Nucleic Acids Research, 36(Web Server), W308-W314. doi:10.1093/nar/gkn303Al-Shahrour, F., Minguez, P., Vaquerizas, J. M., Conde, L., & Dopazo, J. (2005). BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Research, 33(Web Server), W460-W464. doi:10.1093/nar/gki456Al-Shahrour, F., Minguez, P., TΓ‘rraga, J., Medina, I., Alloza, E., Montaner, D., & Dopazo, J. (2007). FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Research, 35(suppl_2), W91-W96. doi:10.1093/nar/gkm260Mueller, L. A., Zhang, P., & Rhee, S. Y. (2003). AraCyc: A Biochemical Pathway Database for Arabidopsis. Plant Physiology, 132(2), 453-460. doi:10.1104/pp.102.017236Navratil, V., de Chassey, B., Combe, C., & Lotteau, V. (2011). When the human viral infectome and diseasome networks collide: towards a systems biology platform for the aetiology of human diseases. BMC Systems Biology, 5(1), 13. doi:10.1186/1752-0509-5-13Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423. doi:10.1002/j.1538-7305.1948.tb01338.

    PhenoFam-gene set enrichment analysis through protein structural information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the current technological advances in high-throughput biology, the necessity to develop tools that help to analyse the massive amount of data being generated is evident. A powerful method of inspecting large-scale data sets is gene set enrichment analysis (GSEA) and investigation of protein structural features can guide determining the function of individual genes. However, a convenient tool that combines these two features to aid in high-throughput data analysis has not been developed yet. In order to fill this niche, we developed the user-friendly, web-based application, PhenoFam.</p> <p>Results</p> <p>PhenoFam performs gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Our tool is designed to analyse complete sets of results from quantitative high-throughput studies (gene expression microarrays, functional RNAi screens, <it>etc</it>.) without prior pre-filtering or hits-selection steps. PhenoFam utilizes Ensembl databases to link a list of user-provided identifiers with protein features from the InterPro database, and assesses whether results associated with individual domains differ significantly from the overall population. To demonstrate the utility of PhenoFam we analysed a genome-wide RNA interference screen and discovered a novel function of plexins containing the cytoplasmic RasGAP domain. Furthermore, a PhenoFam analysis of breast cancer gene expression profiles revealed a link between breast carcinoma and altered expression of PX domain containing proteins.</p> <p>Conclusions</p> <p>PhenoFam provides a user-friendly, easily accessible web interface to perform GSEA based on high-throughput data sets and structural-functional protein information, and therefore aids in functional annotation of genes.</p

    Human and Non-Human Primate Genomes Share Hotspots of Positive Selection

    Get PDF
    Among primates, genome-wide analysis of recent positive selection is currently limited to the human species because it requires extensive sampling of genotypic data from many individuals. The extent to which genes positively selected in human also present adaptive changes in other primates therefore remains unknown. This question is important because a gene that has been positively selected independently in the human and in other primate lineages may be less likely to be involved in human specific phenotypic changes such as dietary habits or cognitive abilities. To answer this question, we analysed heterozygous Single Nucleotide Polymorphisms (SNPs) in the genomes of single human, chimpanzee, orangutan, and macaque individuals using a new method aiming to identify selective sweeps genome-wide. We found an unexpectedly high number of orthologous genes exhibiting signatures of a selective sweep simultaneously in several primate species, suggesting the presence of hotspots of positive selection. A similar significant excess is evident when comparing genes positively selected during recent human evolution with genes subjected to positive selection in their coding sequence in other primate lineages and identified using a different test. These findings are further supported by comparing several published human genome scans for positive selection with our findings in non-human primate genomes. We thus provide extensive evidence that the co-occurrence of positive selection in humans and in other primates at the same genetic loci can be measured with only four species, an indication that it may be a widespread phenomenon. The identification of positive selection in humans alongside other primates is a powerful tool to outline those genes that were selected uniquely during recent human evolution

    Drug-target network in myocardial infarction reveals multiple side effects of unrelated drugs

    Get PDF
    The systems-level characterization of drug-target associations in myocardial infarction (MI) has not been reported to date. We report a computational approach that combines different sources of drug and protein interaction information to assemble the myocardial infarction drug-target interactome network (My-DTome). My-DTome comprises approved and other drugs interlinked in a single, highly-connected network with modular organization. We show that approved and other drugs may both be highly connected and represent network bottlenecks. This highlights influential roles for such drugs on seemingly unrelated targets and pathways via direct and indirect interactions. My-DTome modules are associated with relevant molecular processes and pathways. We find evidence that these modules may be regulated by microRNAs with potential therapeutic roles in MI. Different drugs can jointly impact a module. We provide systemic insights into cardiovascular effects of non-cardiovascular drugs. My-DTome provides the basis for an alternative approach to investigate new targets and multidrug treatment in MI

    Prolonged Application of High Fluid Shear to Chondrocytes Recapitulates Gene Expression Profiles Associated with Osteoarthritis

    Get PDF
    BACKGROUND: Excessive mechanical loading of articular cartilage producing hydrostatic stress, tensile strain and fluid flow leads to irreversible cartilage erosion and osteoarthritic (OA) disease. Since application of high fluid shear to chondrocytes recapitulates some of the earmarks of OA, we aimed to screen the gene expression profiles of shear-activated chondrocytes and assess potential similarities with OA chondrocytes. METHODOLOGY/PRINCIPAL FINDINGS: Using a cDNA microarray technology, we screened the differentially-regulated genes in human T/C-28a2 chondrocytes subjected to high fluid shear (20 dyn/cm(2)) for 48 h and 72 h relative to static controls. Confirmation of the expression patterns of select genes was obtained by qRT-PCR. Using significance analysis of microarrays with a 5% false discovery rate, 71 and 60 non-redundant transcripts were identified to be β‰₯2-fold up-regulated and ≀0.6-fold down-regulated, respectively, in sheared chondrocytes. Published data sets indicate that 42 of these genes, which are related to extracellular matrix/degradation, cell proliferation/differentiation, inflammation and cell survival/death, are differentially-regulated in OA chondrocytes. In view of the pivotal role of cyclooxygenase-2 (COX-2) in the pathogenesis and/or progression of OA in vivo and regulation of shear-induced inflammation and apoptosis in vitro, we identified a collection of genes that are either up- or down-regulated by shear-induced COX-2. COX-2 and L-prostaglandin D synthase (L-PGDS) induce reactive oxygen species production, and negatively regulate genes of the histone and cell cycle families, which may play a critical role in chondrocyte death. CONCLUSIONS/SIGNIFICANCE: Prolonged application of high fluid shear stress to chondrocytes recapitulates gene expression profiles associated with osteoarthritis. Our data suggest a potential link between exposure of chondrocytes/cartilage to abnormal mechanical loading and the pathogenesis/progression of OA

    Linking microarray reporters with protein functions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways.</p> <p>Results</p> <p>This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways.</p> <p>Conclusion</p> <p>Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.</p

    Large-Scale Evidence for Conservation of NMD Candidature Across Mammals

    Get PDF
    BACKGROUND: Alternatively-spliced (AS) forms can vary protein function, intracellular localization and post-translational modifications. AS coupled with mRNA nonsense-mediated decay (NMD) can also control the transcript abundance. Here, we have investigated the genome-scale conservation of alternatively-spliced NMD candidates (AS-NMD candidates), in mammals. METHODOLOGY/PRINCIPAL FINDINGS: We mapped>12 million cDNA/EST library transcripts, comprising pooled data from both older and next-generation sequencing techniques, against genomic sequences to annotate AS-NMD candidates generated by in-frame premature termination codons (PTCs), in the human, mouse, rat and cow genomes. In these genomes, we found populations of genes that harbour AS-NMD candidates, varying in number from approximately 149 to 2,051 genes. We discovered that a highly-significant proportion (27%-35%) of AS-NMD candidate genes in mouse, rat and cow, also have human orthologs targeted for NMD. Intron retention was the most abundant type of AS-NMD, ranging from 43% to 67% of genes harbouring an AS-NMD candidate. Groupings of AS-NMD candidate genes either with or without intron retentions also have highly significant AS-NMD conservation, indicating that the trend is not due primarily to conservation of intron retentions. As a subset, the AS-NMD intron retentions are distinguished from non-retained introns by higher GC content, and codon usage similar to the usage in protein-coding sequences. This indicates that most of these alternatively spliced sequences have coded for proteins in the recent evolutionary past. In general, the AS-NMD candidate genes showed a similar pattern of Gene Ontology functional category enrichments in all four species. Genes linked to nucleic-acid interaction and apoptosis, and involved in pathways linked with cancer, were the most common. Finally, we mapped the AS-NMD candidates to mass spectrometry-derived proteomics data, and gathered evidence of truncated polypeptides for at least 10% of all human AS-NMD candidate transcripts. CONCLUSIONS/SIGNIFICANCE: In summary, our analysis provides strong statistical evidence for conservation of functional AS-NMD candidature across Mammalia for a large subset of genes. However, because codon usage of AS-NMD intron retentions is similar to the usage in exons, it is difficult to de-couple conservation of AS-NMD-based regulation from conservation for protein-coding ability, for intron retentions

    Comparative GO: a web application for comparative Gene Ontology and Gene Ontology-based gene selection in bacteria

    Get PDF
    Extent: 8p.The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO), which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria) from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s) of infection. It can also aid in the discovery of genes associated with specific function(s) for investigation as a novel vaccine or therapeutic targets.Mario Fruzangohar, Esmaeil Ebrahimie, Abiodun D. Ogunniyi, Layla K. Mahdi, James C. Paton, David L. Adelso
    • …
    corecore