Search CORE

39 research outputs found

Epigenome-450K-wide methylation signatures of active cigarette smoking: The Young Finns Study

Author: Hanninen I
Holm L
Hurme M
Kahonen M
Lehtimaki T
Marttila S
Mishra BH
Mishra PP
Mononen N
Raitakari O
Raitoharju E
Toronen P
Publication venue: 'Portland Press Ltd.'
Publication date: 28/10/2022
Field of study

Smoking as a major risk factor for morbidity affects numerous regulatory systems of the human body including DNA methylation. Most of the previous studies with genome-wide methylation data are based on conventional association analysis and earliest threshold-based gene set analysis that lacks sensitivity to be able to reveal all the relevant effects of smoking. The aim of the present study was to investigate the impact of active smoking on DNA methylation at three biological levels: 5'-C-phosphate-G-3' (CpG) sites, genes and functionally related genes (gene sets). Gene set analysis was done with mGSZ, a modern threshold-free method previously developed by us that utilizes all the genes in the experiment and their differential methylation scores. Application of such method in DNA methylation study is novel. Epigenome-wide methylation levels were profiled from Young Finns Study (YFS) participants' whole blood from 2011 follow-up using Illumina Infinium HumanMethylation450 BeadChips. We identified three novel smoking related CpG sites and replicated 57 of the previously identified ones. We found that smoking is associated with hypomethylation in shore (genomic regions 0-2 kilobases from CpG island). We identified smoking related methylation changes in 13 gene sets with false discovery rate (FDR) <= 0.05, among which is olfactory receptor activity, the flagship novel finding of the present study. Overall, we extended the current knowledge by identifying: (i) three novel smoking related CpG sites, (ii) similar effects as aging on average methylation in shore, and (iii) a novel finding that olfactory receptor activity pathway responds to tobacco smoke and toxin exposure through epigenetic mechanisms

UTUPub

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes

Author: A Ruepp
AI Saeed
C Fraley
F Al-Shahrour
FD Gibbons
I Gat-Viks
J Dopazo
J Handl
J Herrero
J Quackenbush
JA Hartigan
K Yeung
KM Kerr
L Kaufman
M Ashburner
MC Abba
MD Robinson
N Bolshakova
NG Waller
P Resnik
P Toronen
PH Sneath
PJ Rousseeuw
R Shamir
S Chu
S Datta
S Datta
S Dudoit
SG Lee
Somnath Datta
Susmita Datta
T Kohonen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species. RESULTS: In this paper, we introduce two performance measures for evaluating the results of a clustering algorithm in its ability to produce biologically meaningful clusters. The first measure is a biological homogeneity index (BHI). As the name suggests, it is a measure of how biologically homogeneous the clusters are. This can be used to quantify the performance of a given clustering algorithm such as UPGMA in grouping genes for a particular data set and also for comparing the performance of a number of competing clustering algorithms applied to the same data set. The second performance measure is called a biological stability index (BSI). For a given clustering algorithm and an expression data set, it measures the consistency of the clustering algorithm's ability to produce biologically meaningful clusters when applied repeatedly to similar data sets. A good clustering algorithm should have high BHI and moderate to high BSI. We evaluated the performance of ten well known clustering algorithms on two gene expression data sets and identified the optimal algorithm in each case. The first data set deals with SAGE profiles of differentially expressed tags between normal and ductal carcinoma in situ samples of breast cancer patients. The second data set contains the expression profiles over time of positively expressed genes (ORF's) during sporulation of budding yeast. Two separate choices of the functional classes were used for this data set and the results were compared for consistency. CONCLUSION: Functional information of annotated genes available from various GO databases mined using ontology tools can be used to systematically judge the results of an unsupervised clustering algorithm as applied to a gene expression data set in clustering genes. This information could be used to select the right algorithm from a class of clustering algorithms for the given data set

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data

Author: A Gasch
A Mateos
A Vazquez
C Brun
D LaCount
D Lockhart
E Dahl
E Marcotte
E Pizzi
E Sonnhammer
G Yona
J Dougherty
J Sachs
J Shock
J Young
Jean-François Dufayard
K Le Roch
K Le Roch
L Dice
L Florens
L Wu
Laurent Bréhélin
M Gardner
M Llinas
MB Eisen
MPS Brown
MR Chmielewski
O Bastion
Olivier Gascuel
P Langley
P Toronen
PT Spellman
S Altschul
T Hastie
Y Chen
Y Zhou
Y Zhou
Z Bozdech
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Of the 5 484 predicted proteins of <it>Plasmodium falciparum</it>, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes. Results We present PlasmoDraft <url>http://atgc.lirmm.fr/PlasmoDraft/</url>, a database of Gene Ontology (GO) annotation predictions for <it>P. falciparum </it>genes based on postgenomic data. Predictions of PlasmoDraft are achieved with a <it>Guilt By Association </it>method named Gonna. This involves (1) a predictor that proposes GO annotations for a gene based on the similarity of its profile (measured with transcriptome, proteome or interactome data) with genes already annotated by GeneDB; (2) a procedure that estimates the confidence of the predictions achieved with each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all <it>P. falciparum </it>genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2 434 genes without any annotations in the Biological Process ontology are associated with specific GO terms (<it>e.g</it>. Rosetting, Antigenic variation), and among these, 841 have confidence values above 50%. In the Cellular Component and Molecular Function ontologies, 1 905 and 1 540 uncharacterized genes are associated with specific GO terms, respectively (740 and 329 with confidence value above 50%). Conclusion All predictions along with their confidence values have been compiled in PlasmoDraft, which thus provides an extensive database of GO annotation predictions that can be achieved with these data sources. The database can be accessed in different ways. A global view allows for a quick inspection of the GO terms that are predicted with high confidence, depending on the various data sources. A gene view and a GO term view allow for the search of potential GO terms attached to a given gene, and genes that potentially belong to a given GO term.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Gene Expression Profiling by 5′-End Sequencing of cDNAs during Reprogramming in the Moss Physcomitrella patens

Stem cells self-renew and repeatedly produce differentiated cells during development and growth. The differentiated cells can be converted into stem cells in some metazoans and land plants with appropriate treatments. After leaves of the moss Physcomitrella patens are excised, leaf cells reenter the cell cycle and commence tip growth, which is characteristic of stem cells called chloronema apical cells. To understand the underlying molecular mechanisms, a digital gene expression profiling method using mRNA 5′-end tags (5′-DGE) was established. The 5′-DGE method produced reproducible data with a dynamic range of four orders that correlated well with qRT-PCR measurements. After the excision of leaves, the expression levels of 11% of the transcripts changed significantly within 6 h. Genes involved in stress responses and proteolysis were induced and those involved in metabolism, including photosynthesis, were reduced. The later processes of reprogramming involved photosynthesis recovery and higher macromolecule biosynthesis, including of RNA and proteins. Auxin and cytokinin signaling pathways, which are activated during stem cell formation via callus in flowering plants, are also activated during reprogramming in P. patens, although no exogenous phytohormone is applied in the moss system, suggesting that an intrinsic phytohormone regulatory system may be used in the moss

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Molecular classification of selective oestrogen receptor modulators on the basis of gene expression profiles of breast cancer cells expressing oestrogen receptor α

The purpose of this study was to classify selective oestrogen receptor modulators based on gene expression profiles produced in breast cancer cells expressing either wtERα or mutant351ERα. In total, 54 microarray experiments were carried out by using a commercially available Atlas cDNA Expression Arrays (Clontech), containing 588 cancer-related genes. Nine sets of data were generated for each cell line following 24 h of treatment: expression data were obtained for cells treated with vehicle EtOH (Control); with 10−9 or 10−8 M oestradiol; with 10−6 M 4-hydroxytamoxifen; with 10−6 M raloxifene; with 10−6 M idoxifene, with 10−6 M EM 652, with 10−6 M GW 7604; with 5×10−5 M resveratrol and with 10−6 M ICI 182,780. We developed a new algorithm ‘Expression Signatures’ to classify compounds on the basis of differential gene expression profiles. We created dendrograms for each cell line, in which branches represent relationships between compounds. Additionally, clustering analysis was performed using different subsets of genes to assess the robustness of the analysis. In general, only small differences between gene expression profiles treated with compounds were observed with correlation coefficients ranged from 0.83 to 0.98. This observation may be explained by the use of the same cell context for treatments with compounds that essentially belong to the same class of drugs with oestrogen receptors related mechanisms. The most surprising observation was that ICI 182,780 clustered together with oestrodiol and raloxifene for cells expressing wtERα and clustered together with EM 652 for cells expressing mutant351ERα. These data provide a rationale for a more precise and elaborate study in which custom made oligonucleotide arrays can be used with comprehensive sets of genes known to have consensus and putative oestrogen response elements in their promoter regions

Crossref

PubMed Central

Expression profiles of switch-like genes accurately classify tissue and infectious disease phenotypes in model-based classification

Abstract Background Large-scale compilation of gene expression microarray datasets across diverse biological phenotypes provided a means of gathering a priori knowledge in the form of identification and annotation of bimodal genes in the human and mouse genomes. These switch-like genes consist of 15% of known human genes, and are enriched with genes coding for extracellular and membrane proteins. It is of interest to determine the prediction potential of bimodal genes for class discovery in large-scale datasets. Results Use of a model-based clustering algorithm accurately classified more than 400 microarray samples into 19 different tissue types on the basis of bimodal gene expression. Bimodal expression patterns were also highly effective in differentiating between infectious diseases in model-based clustering of microarray data. Supervised classification with feature selection restricted to switch-like genes also recognized tissue specific and infectious disease specific signatures in independent test datasets reserved for validation. Determination of "on" and "off" states of switch-like genes in various tissues and diseases allowed for the identification of activated/deactivated pathways. Activated switch-like genes in neural, skeletal muscle and cardiac muscle tissue tend to have tissue-specific roles. A majority of activated genes in infectious disease are involved in processes related to the immune response. Conclusion Switch-like bimodal gene sets capture genome-wide signatures from microarray data in health and infectious disease. A subset of bimodal genes coding for extracellular and membrane proteins are associated with tissue specificity, indicating a potential role for them as biomarkers provided that expression is altered in the onset of disease. Furthermore, we provide evidence that bimodal genes are involved in temporally and spatially active mechanisms including tissue-specific functions and response of the immune system to invading pathogens.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Inducible cAMP Early Repressor (ICER) and Brain Functions

Author: A Barco
A Klejman
A Mouravlev
A Mouravlev
A Rami
AC Conti
AJ Silva
AK Ho
AM Pliakas
B Mayr
B Mellstrom
B Mioduszewska
B Mioduszewska
BE Lonze
BE Porter
BM Laoide
C Mazzucchelli
C Pittenger
C Tinti
CA Kell
CA Molina
CB Nemeroff
CF Stevens
CL Walters
CM Spencer
CM Spencer
D Balschun
D Bartsch
D Konopka
DA Frank
DM Fass
DP Cain
DP Cain
E Borrelli
E Hummler
E Maronde
EJ Folco
EJ Nestler
Gilyana Borlikova
GV Goddard
H Bading
H Ishiguro
H Tomita
HB Gottlieb
HB Machado
IV Lund
J Jaworski
J Won
JA Blendy
JA Blendy
JC Yin
JD Shepard
JF Habener
JF Staiger
JH Chang
JH Kogan
JH Stehle
JH Stehle
JH Stehle
JI Morgan
JJ Brightwell
JJ Brightwell
JJ Brightwell
JP Hoeffler
K Kashihara
K Misund
KA Lee
KS Kim
L Monaco
LR Fitzgerald
M Karolczak
M Lamas
M Lopez de Armentia
M Montminy
M Pfeffer
M Sheng
M Storvik
M Storvik
MA Della Fazia
MR Montminy
N Kojima
N Sakai
NS Foulkes
NS Foulkes
NS Foulkes
P Gass
P Sassone-Corsi
P Sassone-Corsi
P Toronen
Q Yuan
R Bourtchuladze
R Maldonado
RG Bradley
S Bisler
S Kida
S Ruchaud
S Schulz
SA Josselyn
SA Josselyn
SA Josselyn
SC Pandey
Shogo Endo
SJ Barnes
SM Luckman
T Tully
TA Green
TL Wallace
WA Carlezon Jr
WA Carlezon Jr
WC Abraham
Y Hu
Y Liu
Publication venue: Humana Press Inc
Publication date: 01/01/2009
Field of study

The inducible cAMP early repressor (ICER) is an endogenous repressor of cAMP-responsive element (CRE)-mediated gene transcription and belongs to the CRE-binding protein (CREB)/CRE modulator (CREM)/activating transcription factor 1 (ATF-1) gene family. ICER plays an important role in regulating the neuroendocrine system and the circadian rhythm. Other aspects of ICER function have recently attracted heightened attention. Being a natural inducible CREB antagonist, and more broadly, an inducible repressor of CRE-mediated gene transcription, ICER regulates long-lasting plastic changes that occur in the brain in response to incoming stimulation. This review will bring together data on ICER and its functions in the brain, with a special emphasis on recent findings highlighting the involvement of ICER in the regulation of long-term plasticity underlying learning and memory

Crossref

Springer - Publisher Connector

PubMed Central

Berry Flesh and Skin Ripening Features in Vitis vinifera as Assessed by Transcriptional Profiling

Author: A Debono
A Inaba
A Maris
A Ruepp
A Vicens
AI Saeed
AL Waterhouse
AM Fortes
B Hollenbach
BG Coombe
BG Coombe
BG Coombe
BG Coombe
C Bottcher
C Conde
C Davies
C Deytieux-Belleau
C Pastore
CA Helliwell
CM Ford
CR Hale
D Afoufa-Bastien
D Bird
D Fournand
D Panikashvili
DA Brummell
Diego Lijavetzky
DL Cawthon
DRE Possner
E Cantos
E Duchene
E Miedes
F Al-Shahrour
F Emanuelli
F Luan
F Rook
FY Peng
G Le Henanff
Gema Bravo
H Cao
H Wada
HP Ruffner
HY Yang
I Hichri
I Medina
J Battilana
J Battilana
J Chen
J Fenoll
J Grimplet
J Grimplet
J Grimplet
J Kossmann
J Schlosser
J Sheen
JA Garcia-Gago
JA Kennedy
JA Pighin
JG Swift
JJ Mateo
JK Rosenquist
JM Escoubas
José Fenoll
José M. Martínez-Zapater
JP Coles
Juan Carlos Oliveros
Jérôme Grimplet
K Chira
K Koyama
K Mockaitis
K Mori
K Yazaki
KD Cameron
KE Reid
KJ Nunan
KJ Nunan
L Laquitaine
LG Deluc
LG Deluc
LG Deluc
M Ashburner
M Gholami
M Muganu
M Piippo
MA Esteban
MA Hayes
MA Pontin
MA Quesada
MB Ali
MC Cravero
MC Cutanda-Perez
ME Smoot
Miguel A. Blazquez
MJ Martinez-Esteso
MK Kerr
N Terrier
P Commenil
P Diakou
P Gatto
P Polaskova
P Ribereau-Gayon
P Toronen
PA Rea
Pablo Carbonell-Bejerano
PE Kriedemann
Pilar Flores
Pilar Hellín
PK Boss
R Tibshirani
RA Irizarry
RA Salzman
RM Pandey
RW Fung
S Guillaumie
S Kobayashi
S Pilati
S Raychaudhuri
S Vidal
S Wheeler
S Zenoni
SB Tiwari
SD Castellarin
SD Castellarin
SD Cohen
SF Altschul
SK Park
SP Robinson
ST Lund
ST Lund
T Pfannschmidt
TJ Guilfoyle
TL Shimada
TR Thomas
V Prasanna
VG Tusher
VL Singleton
W Hardie
WJ Hardie
WM Kliewer
XY Zhang
Y Hayasaka
YB Liu
Z Bernstein
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Background Ripening of fleshy fruit is a complex developmental process involving the differentiation of tissues with separate functions. During grapevine berry ripening important processes contributing to table and wine grape quality take place, some of them flesh- or skin-specific. In this study, transcriptional profiles throughout flesh and skin ripening were followed during two different seasons in a table grape cultivar ‘Muscat Hamburg’ to determine tissue-specific as well as common developmental programs. Methodology/Principal Findings Using an updated GrapeGen Affymetrix GeneChip® annotation based on grapevine 12×v1 gene predictions, 2188 differentially accumulated transcripts between flesh and skin and 2839 transcripts differentially accumulated throughout ripening in the same manner in both tissues were identified. Transcriptional profiles were dominated by changes at the beginning of veraison which affect both pericarp tissues, although frequently delayed or with lower intensity in the skin than in the flesh. Functional enrichment analysis identified the decay on biosynthetic processes, photosynthesis and transport as a major part of the program delayed in the skin. In addition, a higher number of functional categories, including several related to macromolecule transport and phenylpropanoid and lipid biosynthesis, were over-represented in transcripts accumulated to higher levels in the skin. Functional enrichment also indicated auxin, gibberellins and bHLH transcription factors to take part in the regulation of pre-veraison processes in the pericarp, whereas WRKY and C2H2 family transcription factors seems to more specifically participate in the regulation of skin and flesh ripening, respectively. Conclusions/Significance A transcriptomic analysis indicates that a large part of the ripening program is shared by both pericarp tissues despite some components are delayed in the skin. In addition, important tissue differences are present from early stages prior to the ripening onset including tissue-specific regulators. Altogether, these findings provide key elements to understand berry ripening and its differential regulation in flesh and skin.This study was financially supported by GrapeGen Project funded by Genoma España within a collaborative agreement with Genome Canada. The authors also thank The Ministerio de Ciencia e Innovacion for project BIO2008-03892 and a bilateral collaborative grant with Argentina (AR2009-0021). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer reviewe

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Directory of Open Access Journals

PubMed Central

Repositorio de Universidad de La Rioja

Digital.CSIC

FigShare

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Author: Alborzi S. Z.
Altenhoff A.
Amezola M.
Antczak M.
Aridhi S.
Asgari E.
Atalay V.
Babbitt P. C.
Barot M.
Ben-Hur A.
Benso A.
Bergquist T. R.
Berselli M.
Bhat P.
Bjorne J.
Black G. S.
Boecker F.
Bonneau R.
Borukhov I.
Bosco G.
Boudellioua I.
Brackenridge D. A.
Brenner S. E.
Cao R.
Carraro M.
Casadio R.
Cetin Atalay R.
Chandler C.
Chang J. -M.
Cheng J.
Chi P. -H.
Cozzetto D.
Crocker A. W.
Dai S.
Dalklran A.
Das S.
Davidovic R. S.
Davis L.
Dayton J. B.
Dessimoz C.
Devignes M. -D.
Di Carlo S.
Dogan T.
Dzeroski S.
Fa R.
Fabris F.
Falda M.
Fang H.
Fernandez J. M.
Fontana P.
Frank Y.
Frasca M.
Freddolino P. L.
Freitas A. A.
Friedberg I.
Gemovic B.
Georghiou G.
Ginter F.
Gligorijevic V.
Goldberg T.
Gough J.
Greene C. S.
Grossi G.
Hakala K.
Hamid M. N.
Hoehndorf R.
Hogan D. A.
Holm L.
Hou J.
Hurto R. L.
Jain A.
Jeffery C. J.
Jiang Y.
Jo D.
Johnson D.
Jones D. T.
Kacsoh B. Z.
Kaewphan S.
Kahanda I.
Kihara D.
Koo D. C. E.
Kulmanov M.
Larsen D. J.
Lavezzo E.
Lee A. J.
Lees J. G.
Lewis K. A.
Liao W. -H.
Lichtarge O.
Linial M.
Liu Y. -W.
Mao Q.
Martelli P. L.
Martin M. J.
McGuffin L. J.
McHardy A. C.
Medlar A. J.
Mehryary F.
Mesiti M.
Moen H.
Mofrad M. R. K.
Mooney S. D.
Nguyen H. N.
Notaro M.
Novikov I.
O'Donovan C.
Omdahl A. R.
Orengo C. A.
Paccanaro A.
Pascarelli S.
Perovic V. R.
Petrini A.
Piovesan D.
Politano G.
Profiti G.
Radivojac P.
Re M.
Reeb J.
Renaux A.
Rifaioglu A. S.
Ritchie D. W.
Roche D. B.
Rodriguez J. M.
Romero A. E.
Rose P. W.
Rost B.
Saidi R.
Salakoski T.
Savojardo C.
Schoof H.
Sillitoe I.
Smuc T.
Suh E.
Sumonja N.
Supek F.
Thurlby N.
Tian W.
Tolvanen M. E. E.
Toppo S.
Toronen P.
Torres M.
Tosatto S. C. E.
Tress M. L.
Tseng W. -C.
Ur Rehman H.
Valentini G.
Veljkovic N.
Vidulin V.
Vucetic S.
Wan C.
Wang Z.
Warwick Vesztrocy A.
Wass M. N.
Wilkins A.
Yang H.
Yao S.
You R.
Yunes J. M.
Zhang C.
Zhang F.
Zhang S.
Zhang Y.
Zhang Z.
Zhao C.
Zhou N.
Zhu S.
Zosa E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole genome mutation screening in Candida albicans and aeruginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Author: Almeida-e-Silva DC
Altenhoff A
Babbitt PC
Bankapur AR
Bargsten JW
Ben-Hur A
Benso A
Bhat P
Bonneau R
Brenner SE
Bryson K
Cao RZ
Casadio R
Cejuela JM
Chapman S
Chen CT
Cheng JL
Cibrian-Uhalte E
Clark WT
Cozzetto D
D'Andrea D
Das S
Dawson NL
del Pozo A
Denny P
Dessimoz C
Di Carlo S
Dogan T
Dukka BKC
ElShal S
Falda M
Fang H
Feng S
Fernandez JM
Ferrari C
Fontana P
Foulger RE
Friedberg I
Funk CS
Gabaldon T
Gemovic B
Gillis J
Ginter F
Giollo M
Glisic S
Goldberg T
Gong QT
Gough J
Greene CS
Hakala K
Hamp T
Hieta R
Holm L
Hsu WL
Huntley RP
Jiang YX
Jones DT
Kaewphan S
Kahanda I
Kansakar L
Khan IK
Kihara D
Koo DCE
Koskinen P
Lavezzo E
Lee D
Lees JG
Legge D
Lepore R
Li B
Lin A
Linial M
Lovering RC
Magrane M
Maietta P
Marcet-Houben M
Martelli PL
Martin MJ
Mehryary F
Melidoni AN
Mesiti M
Minneci F
Mooney SD
Moreau Y
Mutowo-Meullenet P
Nepusz T
Ning W
O'Donovan C
Oates M
Ofer D
Orengo CA
Oron TR
Paccanaro A
Pavlidis P
Penfold-Brown D
Perovic V
Pichler K
Piovesan D
Politano G
Profiti G
Radivojac P
Rappoport N
Re M
Rehman HU
Richter L
Robinson PN
Romero AE
Rost B
Sahraeian SME
Salakoski T
Salamov A
Sasidharan R
Savino A
Sedeno-Cortes AE
Sharan M
Shasha D
Shypitsyna A
Sillitoe I
Skunca N
Smithers B
Stern A
Sternberg MJE
Supek F
Tian WD
Toppo S
Toronen P
Tosatto SCE
Tramontano A
Tranchevent LC
Tress ML
Valencia A
Valentini G
van Dijk ADJ
Veljkovic N
Veljkovic V
Vencio RZN
Verspoor KM
Vogel J
Vucetic S
Wang Z
Wass MN
Yang HX
Youngs N
Zakeri P
Zhang S
Zhong Z
Zhou YP
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent

UTUPub