Search CORE

11 research outputs found

A new protein linear motif benchmark for multiple sequence alignment software

Author: Chica Claudia
Gibson Toby J
Perrodou Emmanuel
Poch Olivier
Thompson Julie D
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

BACKGROUND: Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. RESULTS: We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. CONCLUSION: We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences

Crossref

Springer - Publisher Connector

PubMed Central

HAL Descartes

Hal-Diderot

Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

Author: Berend Snel
Claudia Chica
Francesca Diella
Toby J. Gibson
Publication venue: Public Library of Science
Publication date: 08/07/2009
Field of study

BACKGROUND: Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein-protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. RESULTS: The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co-evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif-mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. CONCLUSION: The results suggest that flanking regions are relevant for linear motif-mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The identification of short linear motif-mediated interfaces within the human interactome

Author: Altschul
Beltrao
Betel
Brohee
Ceol
Chica
Cline
Copley
Davey
Davey
Davey
Diella
Dinkel
Dosztanyi
E. Petsalaki
Eddy
Edgar
Edwards
Encinar
Eswar
Finn
Fontes
Gavin
Gfeller
Glotzer
Huang
Hui
Joachims
Jorgensen
K. Luck
Kaneko
Kay
King
Lee
Li
Li
Linding
Meszaros
Mi
Michael
Mohan
N. E. Davey
Neduva
Obenauer
Pawson
Perrodou
Peters
Petsalaki
Pfleger
Pop
R. J. Weatheritt
Schymkowitz
Song
Stark
Stein
Stein
Stirnimann
Szklarczyk
T. J. Gibson
UniProt Consortium
Velankar
Yang
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Eukaryotic proteins are highly modular, containing multiple interaction interfaces that mediate binding to a network of regulators and effectors. Recent advances in high-throughput proteomics have rapidly expanded the number of known protein–protein interactions (PPIs); however, the molecular basis for the majority of these interactions remains to be elucidated. There has been a growing appreciation of the importance of a subset of these PPIs, namely those mediated by short linear motifs (SLiMs), particularly the canonical and ubiquitous SH2, SH3 and PDZ domain-binding motifs. However, these motif classes represent only a small fraction of known SLiMs and outside these examples little effort has been made, either bioinformatically or experimentally, to discover the full complement of motif instances

Crossref

PubMed Central

ELM: the status of the 2010 eukaryotic linear motif resource

Author: Ahmed Sayadi
Aidan Budd
Allegra Via
Berman
Bourhis
Cathryn M. Gould
Chatr-aryamontri
Chen
Chenna
Chica
Chica
Christine Gemünd
Claudia Chica
Copley
Corsini
Deakin
Dice
Diella
Diella
Diella
Dingwall
Dinkel
Eddy
Edeling
Edwards
Ferraro
Finn
Fox-Erlich
Francesca Diella
Fuxreiter
Gene Ontology Consortium
Gibson
Gilles Travé
Glotzer
Gnad
Hantschel
He
Hemsley
Hermjakob
Hilser
Honnappa
Hornbeck
Hulo
Hunt
Hunter
Jakub Paś
Jan Christian Bryne
Jensen
Kadaveru
Kadlec
Katoh
Kerrien
Keshava Prasad
Kitano
Krogh
Leszek Rychlewski
Letunic
Machida
Maffei
Manuela Helmer-Citterich
Markus Seiler
Mayer
Meszaros
Michael
Miller
Neduva
Neduva
Neduva
Niall Haslam
Norman E. Davey
Obenauer
Pawson
Pawson
Pelham
Perrodou
Petsalaki
Petsalaki
Pettifer
Privette
Puntervoll
Pål Puntervoll
Rajasekaran
Ramu
Rein Aasland
Ren
Rideau
Robert J. Weatheritt
Rumpf
Rune Linding
Russell
Salsmann
Sayers
Seiler
Smedley
Smock
Sophie Chabanis-Davidson
Stein
Stein
Stein
Steinmetz
Sushama Michael
Tan
Theis
Tim Hughes
Toby J. Gibson
UniProt Consortium
Via
Volonte
Waterhouse
Weisbrich
Whitty
Williamson
Wright
Zhang
Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation

HAL Descartes

Digitala Vetenskapliga Arkivet - Academic Archive On-line

NORA - Norwegian Open Research Archives

ART

Institute of Cancer Research Repository

Publikationer från Uppsala Universitet

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

A new protein linear motif benchmark for multiple sequence alignment software

Author: Chica Claudia
Gibson Toby J
Perrodou Emmanuel
Poch Olivier
Thompson Julie D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2008
Field of study

Abstract Background Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. Results We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. Conclusion We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.</p

Directory of Open Access Journals

Automatic and manual functional annotation in a distributed web service environment

Author: Jöcker Anika
Publication venue
Publication date: 01/01/2009
Field of study

While the number of genomic sequences becoming available is increasing exponentially, most genes are not functionally well characterized. Finding out more about the function of a gene and about functional relationships between genes will be the next big bottleneck in the post-genomic era. On the one hand improved pipelines and tools are needed in this context, because running experiments for all predicted genes is not feasible. On the other hand manual curation of the automatic predictions is necessary to judge the reliability of the automatic annotation and to get a more comprehensive view on the function of each individual gene. For the automatic functional annotation often a homology based function transfer from functionally characterized genes is applied using methods like Blast. However, this approach has many drawbacks and makes systematic errors by not taking care of speciation and duplication events. Phylogenomics has shown to improve the functional prediction accuracy by taking the evolutionary history of genes in a phylogenetic tree context into account. In this thesis the manual process from the assembly of the DNA sequence to the functional characterization of genes and the identification and comparison of shared syntenic regions, including the identification of candidate genes for pathogen resistance in potato chromosome V, is explained and problems discussed. To improve the automatic functional annotation in genome projects, a phylogenomic pipeline, which includes SIFTER one of the best phylogenomic tools in this area, is introduced, improved and tested in the Medicago truncatula, Sorghum bicolor and Solanum lycopersicum genome projects. To obtain new candidate genes for the development of new drugs and crop protection products, non-plant specific genes, like the transferrin family which is not known in plants yet, are extracted from the M. truncatula and S. bicolor genomes and further investigated. For further improvement of the annotation, a new phylogenomic approach is developed. This approach makes use of annotated functional attributes to calculate the functional mutation rate between genes and groups of genes in a phylogenetic tree and to find out if the function of a gene can be transferred or not. The new approach is integrated into the SIFTER tool and tested on the blue-light photoreceptor/photolyase family and on a test set of manually curated Arabidopsis thaliana genes. Using both test sets the prediction accuracy could be significantly improved and a more comprehensive view on the gene function could be obtained. But because still no tool is able to annotate all functions of a gene with 100% accuracy, I introduce a system for manual functional annotation, called AFAWE. AFAWE runs different web services for the functional annotation and displays the results and intermediate results in a comprehensive web interface that facilitates comparison. It can be used for any organism and any kind of gene. The inputs are the amino acid sequence and the corresponding organism. Because of its flexible structure, new web services and workflows can be easily integrated. Besides Blast searches against different databases and protein domain prediction tools, AFAWE also includes the phylogenomic pipeline. Different filters help to identify trustworthy results from each analysis. Furthermore a detailed manual annotation can be assigned to each protein, which will be used to update the functional annotation in public databases like MIPSPlantsDB

Kölner UniversitätsPublikationsServer

MPG.PuRe

Comparative Evaluation of Methods for Sequence Alignment and Annotation

Author: Pljusnin Ilja
Publication venue: 'University of Helsinki Libraries'
Publication date: 09/10/2020
Field of study

The speed of DNA and RNA sequencing has long ago surpassed the capacity of laboratories to assign function to these sequences by direct experiment. Fortunately, function and other information can be effectively transferred to novel data from previously accumulated knowledge by sequence homology. This has resulted in the development of hundreds of novel homology-based methods. However, the tendency of method developers to be overoptimistic about their own results, biases in the evaluation metrics used to rank methods, inconsistency between different rankings and evaluation metrics, misplaced popularity of methods relative to their performance all indicate that, in many cases, clear knowledge of the comparative performance of different methods is lacking. This has two main consequences. First, researchers use suboptimal tools. Second, method development may go astray because the merits used for guiding method optimization are biased or unclear. To avoid these difficulties, further research is needed into methodology of evaluation and comparative studies. One core approach for transferring function by sequence homology is to create a multiple sequence alignment (MSA) that represents a given group of similar sequences. The resulting alignment can be applied to annotate novel sequences using profile hidden Markov models (HMMs), to create phylogenetic trees or to compare structural features. The application of MSAs and profile HMMs for genome annotation was explored in publication (I). Creating MSA has been addressed by a vast field of research, however there is a lack of independent comparative studies and no comparative studies for alignment strategies. In publication (II) a novel modular MSA aligner was implemented to aid in comparative evaluation of different MSA strategies. Different MSA strategies were then compared to each other and to the state-of-the-art MSA software on three benchmark databases. Another core approach has been to combine homology searches with assignment of annotation terms from a controlled vocabulary such as the Gene Ontology (GO). Hundreds of methods that assign GO terms to novel sequences have been introduced. The research community has also invested into the objective evaluation of these methods via third party competitions. However, the evaluation metrics and merits used in these competitions are still under active debate and need further research and development. In publication (III) a novel framework was introduced for the development of unbiased high-quality evaluation metrics. By testing 37 variations of popular metrics, our approach revealed strong differences between metrics, a list of clearly biased metrics, and a list of high-quality metrics that are well suited for the evaluation of GO annotations. In summary, this thesis presents novel frameworks and implementation platforms for comparative evaluation of two important classes of homology-based methods: MSA aligners and GO sequence classifiers. These results will be instrumental for developing more accurate MSA aligners, for eliminating many forms of bias inherent in contemporary evaluation protocols, for producing informative method rankings for non-specialist users and for guiding method development towards merits that truly reflect the utility of the designed tools.Johtuen DNA ja RNA sekvensointiteknologian nopeasta kehityksestä suurin osa sekvenssien biologisista kuvauksista tuotetaan sekvenssihomologiaan perustuvilla automaattisilla menetelmillä. Homologiaan perustuvia menetelmiä on kehitetty satoja, mikä korostaa objektiivisen ja riippumattoman menetelmävertailun merkitystä. On monia virhelähteitä, jotka vääristävät ja hankaloittavat menetelmävertailua: oman menetelmän yliarviointi, ylisovittaminen, valikoitu raportointi, sekä harhaiset ja keskenään ristiriitaiset arviointimetriikat. Harhaisella menetelmävertailulla on kaksi merkittävää seurausta: (1) epäoptimaaliset menetelmät päätyvät tutkijayhteisön käyttöön, (2) menetelmäkehitys harhaantuu, koska kehitystä ohjaavat arviointikriteerit ovat harhaisia tai epäselviä. Edellä mainittuja vaikeuksia voidaan välttää kohdentamalla tutkimusta itse vertailevaan menetelmäarviointiin. Monisekvenssilinjaus (MSL) on sekvenssihomologiaan perustuva menetelmä, jolla on hyvin laaja sovelluskenttä molekyylibiologisessa tutkimustyössä. Julkaisussa (I) tutkittiin MSL-linjausten ja Markovin piilomallien soveltamista bakteerigenomien kuvaukseen. MSL-kentällä on edelleen puutetta riippumattomasta menetelmäarvioinnista, ja erityisesti eri MSL-algoritmiratkaisuja vertailevista tutkimuksista. Julkaisussa (II) esitettiin uusi modulaarinen MSL-ohjelma, jonka avulla useita MSL-algoritmiratkaisuja vertailtiin toisiinsa ja MSL-alan huippusovelluksiin kolmella vertailutietokannalla. Vertailun perusteella annettiin selkeitä suosituksia optimaalisista MSL-algoritmiratkaisuista ja parhaista MSL-ohjelmista. Sekvenssikuvauksia tuottavat automaattiset menetelmät useimmiten käyttävät geeniontologian (GO) termistöä. Koska vuosittain julkaistaan satoja GO-menetelmiä, tutkimusyhteisö on panostanut kyseisten menetelmien vertailevaan arviointiin. Kuitenkin GO-menetelmävertailun kentällä arviointikriteerit ovat vakiintumattomia ja monet käytössä olevat arviointimetriikat ovat joko harhaisia tai keskenään ristiriitaisia. Julkaisussa (III) ehdotetaan ratkaisuksi uutta menetelmää, jonka avulla on mahdollista testata ja kehittää korkealaatuisia ja harhattomia arviointimetriikoita. Julkaisussa (III) testattiin useita arviointimetriikoita ja osoitettiin, että monet tällä hetkellä käytössä olevat GO-arviointimetriikat ovat voimakkaasti harhaisia. Testauksen perusteella annettiin myös selkeitä suosituksia arviointimetriikoista, jotka takaavat harhattoman menetelmävertailun

Helsingin yliopiston digitaalinen arkisto

A new protein linear motif benchmark for multiple sequence alignment software-4

Author: Claudia Chica (65836)
Emmanuel Perrodou (13945)
Julie D Thompson (19569)
Olivier Poch (13951)
Toby J Gibson (3993)
Publication venue
Publication date
Field of study

subset 1, showing the extreme observations (stars or circles), lower quartile, median, upper quartile, and largest observation in each similarity category. b) Execution times in seconds required to construct all the multiple alignments in Subset 1. Programs are displayed in the order of the Friedman test using the SPS scores for group V11 (additional file ), with the highest scoring program on the left.Copyright information:Taken from "A new protein linear motif benchmark for multiple sequence alignment software"http://www.biomedcentral.com/1471-2105/9/213BMC Bioinformatics 2008;9():213-213.Published online 25 Apr 2008PMCID:PMC2374782.</p

FigShare

A new protein linear motif benchmark for multiple sequence alignment software-2

Author: Claudia Chica (65836)
Emmanuel Perrodou (13945)
Julie D Thompson (19569)
Olivier Poch (13951)
Toby J Gibson (3993)
Publication venue
Publication date
Field of study

different conditions, showing the extreme observations (stars or circles), lower quartile, median, upper quartile, and largest observation. Significant differences, according to a Wilcoxon signed ranks test (p < 0.05), are indicated by an asterix on the x-axis. P-values for the Wilcoxon tests are available in additional file , table 3. a) SPS scores for alignments of sequences with validated motifs only compared to alignments including sequences with errors. b) SPS scores for alignments of sequences with validated motifs only compared to alignments including sequences containing false positive (FP) motifs. c) SPS scores for alignments of sequences with validated motifs only compared to alignments including sequences that do not contain any examples of the motif.Copyright information:Taken from "A new protein linear motif benchmark for multiple sequence alignment software"http://www.biomedcentral.com/1471-2105/9/213BMC Bioinformatics 2008;9():213-213.Published online 25 Apr 2008PMCID:PMC2374782.</p

FigShare