Search CORE

45 research outputs found

Protein Function Prediction using Phylogenomics, Domain Architecture Analysis, Data Integration, and Lexical Scoring

Author: Hallab Asis
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

“As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally.” (Radivojac, Clark, Oron, et al. 2013) With this goal, three new protein function annotation tools were developed, which produce trustworthy and concise protein annotations, are easy to obtain and install, and are capable of processing large sets of proteins with reasonable computational resource demands. Especially for high throughput analysis e.g. on genome scale, these tools improve over existing tools both in ease of use and accuracy. They are dubbed: • Automated Assignment of Human Readable Descriptions (AHRD) (github.com/groupschoof/AHRD; Hallab, Klee, Srinivas, and Schoof 2014), • AHRD on gene clusters, and • Phylogenetic predictions of Gene Ontology (GO) terms with specific calibrations (PhyloFun v2). “AHRD” assigns human readable descriptions (HRDs) to query proteins and was developed to mimic the decision making process of an expert curator. To this end it processes the descriptions of reference proteins obtained by searching selected databases with BLAST (Altschul, Madden, Schaffer, et al. 1997). Here, the trust a user puts into results found in each of these databases can be weighted separately. In the next step the descriptions of the found homologous proteins are filtered, removing accessions, species information, and finally discarding uninformative candidate descriptions like e.g. “putative protein”. Afterwards a dictionary of meaningful words is constructed from those found in the remaining candidates. In this, another filter is applied to ignore words, not conveying information like e.g. the word “protein” itself. In a lexical approach each word is assigned a score based on its frequency in all candidate descriptions, the sequence alignment quality associated with the candidate reference proteins, and finally the already mentioned trust put into the database the reference was obtained from. Subsequently each candidate description is assigned a score, which is computed from the respective scores of the meaningful words contained in that candidate. Also incorporated into this score is the description’s frequency among all regarded candidates. In the final step the highest scoring description is assigned to the query protein. The performance of this lexical algorithm, implemented in “AHRD”, was subsequently compared with that of competitive methods, which were Blast2GO and “best Blast”, where the latter “best Blast” simply passes the description of the best scoring hit to the query protein. To enable this comparison of performance, and in lack of a robust evaluation procedure, a new method to measure the accuracy of textual human readable protein descriptions was developed and applied with success. In this, the accuracy of each assigned competitive description was inferred with the frequently used “F-measure”, the harmonic mean of precision and recall, which we computed regarding meaningful words appearing in both the reference and the assigned descriptions as true positives. The results showed that “AHRD” not only outperforms its competitors by far, but also is very robust and thus does not require its users to use carefully selected parameters. In fact, AHRD’s robustness was demonstrated through cross validation and use of three different reference sets. The second annotation tool “AHRD on gene clusters” uses conserved protein domains from the InterPro database (Apweiler, Attwood, Bairoch, et al. 2000) to annotate clusters of homologous proteins. In a first step the domains found in each cluster are filtered, such that only the most informative are retained. For example are family descriptions discarded, if more detailed sub-family descriptions are also found annotated to members of the cluster. Subsequently, the most frequent candidate description is assigned, favoring those of type “family” over “domain”. Finally the third tool “PhyloFun (v2)” was developed to annotate large sets of query proteins with terms from the Gene Ontology. This work focussed on extending the “Belief propagation” (Pearl 1988) algorithm implemented in the “Sifter” annotation tool (Engelhardt, Jordan, Muratore, and Brenner 2005; Engelhardt, Jordan, Srouji, and Brenner 2011). Jöcker had developed a phylogenetic pipeline generating the input that was fed into the Sifter program. This pipeline executes stringent sequence similarity searches in a database of selected reference proteins, and reconstruct a phylogenetic tree from the found orthologs and inparalogs. This tree is than used by the Sifter program and interpreted as a “Bayesian Network” into which the GO term annotations of the homologous reference proteins are fed as “diagnostic evidence” (Pearl 1988). Subsequently the current strength of belief, the probability of this evidence being also the true state of ancestral tree nodes, is then spread recursively through the tree towards its root, and then vice versa towards the tips. These, of course, include the query protein, which in the final step is annotated with those GO terms that have the strongest belief. Note that during this recursive belief propagation a given GO term’s annotation probability depends on both the length of the currently processed branch, as well as the type of evolutionary event that took place. This event can be one of “speciation” or “duplication”, such that function mutation becomes more likely on longer branches and particularly after “duplication” events. A particular goal in extending this algorithm was to base the annotation probability of a given GO term not on a preconceived model of function evolution among homologous proteins as implemented in Sifter, but instead to compute these GO term annotation probabilities based on empirical measurements. To achieve this, calibrations were computed for each GO term separately, and reference proteins annotated with a given GO term were investigated such that the probability of function loss could be assessed empirically for decreasing sequence homology among related proteins. A second goal was to overcome errors in the identification of the type of evolutionary events. These errors arose from missing knowledge in terms of true species trees, which, in version 1 of the PhyloFun pipeline, are compared with the actual protein trees in order to tell “duplication” from “speciation” events (Zmasek and Eddy 2001). As reliable reference species trees are sparse or in many cases not available, the part of the algorithm incorporating the type of evolutionary event was discarded. Finally, the third goal postulated for the development of PhyloFun’s version 2 was to enable easy installation, usage, and calibration on latest available knowledge. This was motivated by observations made during the application of the first version of PhyloFun, in which maintaining the knowledge-base was almost not feasible. This obstacle was overcome in version 2 of PhyloFun by obtaining required reference data directly from publicly available databases. The accuracy and performance of the new PhyloFun version 2 was assessed and compared with selected competitive methods. These were chosen based on their widespread usage, as well as their applicability on large sets of query proteins without them surpassing reasonable time and computational resource requirements. The measurement of each method’s performance was carried out on a “gold standard”, obtained from the Uniprot/Swissprot public database (Boeckmann, Bairoch, Apweiler, et al. 2003), of 1000 selected reference proteins, all of which had GO term annotations made by expert curators and mostly based on experimental verifications. Subsequently the performance assessment was executed with a slightly modified version of the “Critical Assessment of Function Annotation experiment (CAFA)” experiment (Radivojac, Clark, Oron, et al. 2013). CAFA compares the performance of different protein function annotation tools on a worldwide scale using a provided set of reference proteins. In this, the predictions the competitors deliver are evaluated using the already introduced “F-measure”. Our performance evaluation of PhyloFun’s protein annotations interestingly showed that PhyloFun outperformed all of its competitors. Its use is recommended furthermore by the highly accurate phylogenetic trees the pipeline computes for each query and the found homologous reference proteins. In conclusion, three new premium tools addressing important matters in the computational prediction of protein function were developed and, in two cases, their performance assessed. Here, both AHRD and PhyloFun (v2) outperformed their competitors. Further arguments for the usage of all three tools are, that they are easy to install and use, as well as being reasonably resource demanding. Because of these results the publications of AHRD and PhyloFun (v2) are in preparation, even while AHRD already is applied by different researchers worldwide

bonndoc – Der Publikationsserver der Universität Bonn

Photoreceptor Activity Contributes to Contrasting Responses to Shade in Cardamine and Arabidopsis Seedlings

Author: Gan Xiangchao
Gómez-Cadenas Aurelio
Hallab Asis
Jenkins Huw
Martínez García Jaime F
Molina-Contreras Maria José
Morelli Luca
Moreno Romero Jordi
Pastor Andreu Pedro
Paulišić Sandi
Rodríguez Concepción Manuel
Roig Villanova Irma
Then Christiane
Tsiantis Miltos
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2019
Field of study

Plants have evolved two major ways to deal with nearby vegetation or shade: avoidance and tolerance. Moreover, some plants respond to shade in different ways; for example, Arabidopsis thaliana undergoes an avoidance response to shade produced by vegetation, but its close relative Cardamine hirsuta tolerates shade. How plants adopt opposite strategies to respond to the same environmental challenge is unknown. Here, using a genetic strategy, we identified the C. hirsuta slender in shade1 (sis1) mutants, which produce strongly elongated hypocotyls in response to shade. These mutants lack the phytochrome A (phyA) photoreceptor. Our findings suggest that C. hirsuta has evolved a highly efficient phyA-dependent pathway that suppresses hypocotyl elongation when challenged by shade from nearby vegetation. This suppression relies, at least in part, on stronger phyA activity in C. hirsuta; this is achieved by increased ChPHYA expression and protein accumulation combined with a stronger specific intrinsic repressor activity. We suggest that modulation of photoreceptor activity is a powerful mechanism in nature to achieve physiological variation (shade tolerance vs. avoidance) for species to colonize different habitats

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Repositori Institucional de la Universitat Jaume I

Diposit Digital de Documents de la UAB

Digital.CSIC

MPG.PuRe

Pan-European study of genotypes and phenotypes in the Arabidopsis relative Cardamine hirsuta reveals how adaptation, demography, and development shape diversity patterns

Author: Alonso-Blanco Carlos
Baumgarten Lukas
Bazakos Christos
Cartolano Maria
Casimiro Pedro Gp
Cooke Elizabeth L
Filatov Dmitry A
Gan Xiangchao
Hallab Asis
Huettel Bruno
Lamb Jonathan
Laurent Stefan
Lempe Janne
Mane Sébastien
Mott Richard
Neuffer Barbara
Pavlidis Pavlos
Pieper Bjorn
Schaefer Hanno
Song Baoxing
Srivastava Rachita
Strütt Stefan
Tattersall Alexander D
Tsiantis Miltos
Žanko Danijela
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/07/2023
Field of study

We study natural DNA polymorphisms and associated phenotypes in the Arabidopsis relative Cardamine hirsuta. We observed strong genetic differentiation among several ancestry groups and broader distribution of Iberian relict strains in European C. hirsuta compared to Arabidopsis. We found synchronization between vegetative and reproductive development and a pervasive role for heterochronic pathways in shaping C. hirsuta natural variation. A single, fast-cycling ChFRIGIDA allele evolved adaptively allowing range expansion from glacial refugia, unlike Arabidopsis where multiple FRIGIDA haplotypes were involved. The Azores islands, where Arabidopsis is scarce, are a hotspot for C. hirsuta diversity. We identified a quantitative trait locus (QTL) in the heterochronic SPL9 transcription factor as a determinant of an Azorean morphotype. This QTL shows evidence for positive selection, and its distribution mirrors a climate gradient that broadly shaped the Azorean flora. Overall, we establish a framework to explore how the interplay of adaptation, demography, and development shaped diversity patterns of 2 related plant species

UCL Discovery

Recommended from our members

The giant diploid faba genome unlocks variation in a global protein crop

Author: Andersen Stig Uggerhøj
Angra Deepti
Aubert Grégoire
Bednář Petr
Bornhofen Elesandro
Boussageon Raphaël
Cheung Kwok
Courty Pierre Emmanuel
Doležel Jaroslav
Fechete Lavinia I.
Golicz Agnieszka A.
Gundlach Heidrun
Hallab Asis
Himmelbach Axel
Holm Liisa U.
Imbert Baptiste
Janss Luc L.
Jayakodi Murukarthick
Kaur Sukhjiwan
Keeble-Gagnère Gabriel
Khazaei Hamid
Koblížková Andrea
Kobrlová Lucie
Krejčí Petra
Kreplak Jonathan
Macas Jiří
Mascher Martin
Mouritzen Troels W.
Nadzieja Marcin
Neumann Pavel
Nielsen Linda Kærgaard
Novák Petr
Orabi Jihad
O’Sullivan Donal Martin
Padmarasu Sudharsan
Robertson-Shersby-Harvie Tom
Robledillo Laura Ávila
Schiemann Andrea
Schubert Ingo
Schulman Alan H.
Smýkal Petr
Snowdon Rod J.
Stein Nils
Stoddard Frederick L.
Stougaard Jens
Tanskanen Jaakko
Tayeh Nadim
Torres Ana M.
Törönen Petri
Usadel Björn
Warsame Ahmed O.
Wittenberg Alexander H.J.
Zhang Hailin
Čížková Jana
Publication venue
Publication date: 01/01/2023
Field of study

Publisher Copyright: © 2023, The Author(s).Increasing the proportion of locally produced plant protein in currently meat-rich diets could substantially reduce greenhouse gas emissions and loss of biodiversity1. However, plant protein production is hampered by the lack of a cool-season legume equivalent to soybean in agronomic value2. Faba bean (Vicia faba L.) has a high yield potential and is well suited for cultivation in temperate regions, but genomic resources are scarce. Here, we report a high-quality chromosome-scale assembly of the faba bean genome and show that it has expanded to a massive 13 Gb in size through an imbalance between the rates of amplification and elimination of retrotransposons and satellite repeats. Genes and recombination events are evenly dispersed across chromosomes and the gene space is remarkably compact considering the genome size, although with substantial copy number variation driven by tandem duplication. Demonstrating practical application of the genome sequence, we develop a targeted genotyping assay and use high-resolution genome-wide association analysis to dissect the genetic basis of seed size and hilum colour. The resources presented constitute a genomics-based breeding platform for faba bean, enabling breeders and geneticists to accelerate the improvement of sustainable protein production across the Mediterranean, subtropical and northern temperate agroecological zones.Peer reviewe

Central Archive at the University of Reading

Jukuri

Juelich Shared Electronic Resources

Helsingin yliopiston digitaalinen arkisto

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Cell type specific transcriptional reprogramming of maize leaves during Ustilago maydis induced tumor formation

Author: Doehlemann Gunther
Ernst Corinna
Hallab Asis
Matei Alexandra
Usadel Björn
Villajuana-Bonequi Mitzi
Publication venue: Macmillan Publishers Limited, part of Springer Nature
Publication date: 01/01/2019
Field of study

Publikationsserver der RWTH Aachen University

Cell type specific transcriptional reprogramming of maize leaves during Ustilago maydis induced tumor formation

Author: Doehlemann Gunther
Ernst Corinna
Hallab Asis
Matei Alexandra
Usadel Bjoern
Villajuana-Bonequi Mitzi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Ustilago maydis is a biotrophic pathogen and well-established genetic model to understand the molecular basis of biotrophic interactions. U. maydis suppresses plant defense and induces tumors on all aerial parts of its host plant maize. In a previous study we found that U. maydis induced leaf tumor formation builds on two major processes: the induction of hypertrophy in the mesophyll and the induction of cell division (hyperplasia) in the bundle sheath. In this study we analyzed the cell-type specific transcriptome of maize leaves 4 days post infection. This analysis allowed identification of key features underlying the hypertrophic and hyperplasic cell identities derived from mesophyll and bundle sheath cells, respectively. We examined the differentially expressed (DE) genes with particular focus on maize cell cycle genes and found that three A-type cyclins, one B-, D- and T-type are upregulated in the hyperplasic tumorous cells, in which the U. maydis effector protein See1 promotes cell division. Additionally, most of the proteins involved in the formation of the pre-replication complex (pre-RC, that assure that each daughter cell receives identic DNA copies), the transcription factors E2F and DPa as well as several D-type cyclins are deregulated in the hypertrophic cells

Kölner UniversitätsPublikationsServer

Publikationsserver der RWTH Aachen University

Plant PhysioSpace: a robust tool to compare stress response across plant species

Author: Buer Benjamin
Hadizadeh Esfahani Ali
Hallab Asis
Maß Janina
Nevarez David
Ott Mark-Christoph
Schuldt Bernhard M
Schuppert Andreas
Usadel Björn
Publication venue
Publication date: 01/01/2021
Field of study

Generalization of transcriptomics results can be achieved by comparison across experiments. This generalization is based on integration of interrelated transcriptomics studies into a compendium. Such a focus on the bigger picture enables both characterizations of the fate of an organism and distinction between generic and specific responses. Numerous methods for analyzing transcriptomics datasets exist. Yet, most of these methods focus on gene-wise dimension reduction to obtain marker genes and gene sets for, for example, pathway analysis. Relying only on isolated biological modules might result in missing important confounders and relevant contexts. We developed a method called Plant PhysioSpace, which enables researchers to compute experimental conditions across species and platforms without a priori reducing the reference information to specific gene sets. Plant PhysioSpace extracts physiologically relevant signatures from a reference dataset (i.e. a collection of public datasets) by integrating and transforming heterogeneous reference gene expression data into a set of physiology-specific patterns. New experimental data can be mapped to these patterns, resulting in similarity scores between the acquired data and the extracted compendium. Because of its robustness against platform bias and noise, Plant PhysioSpace can function as an inter-species or cross-platform similarity measure. We have demonstrated its success in translating stress responses between different species and platforms, including single-cell technologies. We have also implemented two R packages, one software and one data package, and a Shiny web application to facilitate access to our method and precomputed models

PubMed Central

Juelich Shared Electronic Resources

Plant PhysioSpace: a robust tool to compare stress response across plant species

Author: Buer Benjamin
Hadizadeh Esfahani Ali
Hallab Asis
Maß Janina
Nevarez David
Ott Mark-Christoph
Schuldt Bernhard M
Schuppert Andreas
Usadel Björn
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

PubMed Central

Juelich Shared Electronic Resources

GXP: Analyze and Plot Plant Omics Data in Web Browsers

Author: Atemia Joseph
Eiteneuer Constantin
Fahrner Sven
Hallab Asis
Pieruschka Roland
Reimer Julia J.
Schrader Andrea
Schurr Ulrich
Schwacke Rainer
Usadel Björn
Velasco David
Wahl Vanessa
Wang Dan
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Next-generation sequencing and metabolomics have become very cost and work efficient and are integrated into an ever-growing number of life science research projects. Typically, established software pipelines analyze raw data and produce quantitative data informing about gene expression or concentrations of metabolites. These results need to be visualized and further analyzed in order to support scientific hypothesis building and identification of underlying biological patterns. Some of these tools already exist, but require installation or manual programming. We developed “Gene Expression Plotter” (GXP), an RNAseq and Metabolomics data visualization and analysis tool entirely running in the user’s web browser, thus not needing any custom installation, manual programming or uploading of confidential data to third party servers. Consequently, upon receiving the bioinformatic raw data analysis of RNAseq or other omics results, GXP immediately enables the user to interact with the data according to biological questions by performing knowledge-driven, in-depth data analyses and candidate identification via visualization and data exploration. Thereby, GXP can support and accelerate complex interdisciplinary omics projects and downstream analyses. GXP offers an easy way to publish data, plots, and analysis results either as a simple exported file or as a custom website. GXP is freely available on GitHub (see introduction)

Directory of Open Access Journals

PubMed Central

Juelich Shared Electronic Resources

MPG.PuRe

Recently duplicated sesterterpene (C25) gene clusters in Arabidopsis thaliana modulate root microbiota

Author: Bai Yang
Chen Quingwen
Gan Xiangchao
Hallab Asis
He Juan
Jiang Ting
Jin Tao
Liu Haili
Liu Yong-Xin
Liu Zhixi
Ma Yihua
Schranz M.E.
Wang Guodong
Wang Xuemei
Wang Yong
Zhang Fengxia
Zhao T.
Publication venue
Publication date: 01/01/2019
Field of study

Land plants co-speciate with a diversity of continually expanding plant specialized metabolites (PSMs) and root microbial communities (microbiota). Homeostatic interactions between plants and root microbiota are essential for plant survival in natural environments. A growing appreciation of microbiota for plant health is fuelling rapid advances in genetic mechanisms of controlling microbiota by host plants. PSMs have long been proposed to mediate plant and single microbe interactions. However, the effects of PSMs, especially those evolutionarily new PSMs, on root microbiota at community level remain to be elucidated. Here, we discovered sesterterpenes in Arabidopsis thaliana, produced by recently duplicated prenyltransferase-terpene synthase (PT-TPS) gene clusters, with neo-functionalization. A single-residue substitution played a critical role in the acquisition of sesterterpene synthase (sesterTPS) activity in Brassicaceae plants. Moreover, we found that the absence of two root-specific sesterterpenoids, with similar chemical structure, significantly affected root microbiota assembly in similar patterns. Our results not only demonstrate the sensitivity of plant microbiota to PSMs but also establish a complete framework of host plants to control root microbiota composition through evolutionarily dynamic PSMs

Wageningen University & Research Publications

MPG.PuRe