Search CORE

92 research outputs found

An analysis of the Sargasso Sea resource and the consequences for database composition

Author: Cozzetto D
Tramontano A
Tress ML
Valencia A
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Background: The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method.These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource.Results: The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments.Conclusion: These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

Digital.CSIC

Evaluation of CASP8 Model Quality Predictions

Author: COZZETTO D
KRYSHTAFOVYCH A
TRAMONTANO ANNA
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

Archivio della ricerca- Università di Roma La Sapienza

The assessment of methods for protein structure prediction

Author: A.
A.
Cozzetto
D.
D.
Giorgetti
Raimondo
Tramontano
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Methods for protein structure prediction are flourishing and becoming widely available to both experimentalists and computational biologists. But, how good are they? What is their range of applicability and how can we know which method is better suited for the task at hand? These are the questions that this chapter tries to address, by describing automatic evaluation methods as well as the world-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP) initiative and focusing on the specific problems of assessing the quality of a protein 3D model

Catalogo dei prodotti della ricerca

Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster

Author: A Singhania
A Sokolov
AE Lobley
B Marita
BR Graveley
Cen Wan
CF Wu
Christine A. Orengo
D Barrell
D Cozzetto
D Cozzetto
D Cozzetto
D Fristrom
D Sutherland
David T. Jones
DJ Montell
F Minneci
Federico Minneci
I Ezkurdia
J Friedman
JB Weiss
JC Costello
JG Lees
Jonathan G. Lees
JW Truman
L Breiman
L Lan
M Ashburner
M Friedrich
NK Cho
P Radivojac
P Tomancak
R Cagan
S Hunter
S Roy
SD Hooper
T Bossing
T Chang
T Cover
T Kojima
TR Li
VR Chintapalli
Yanay Ofran
YX Jiang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2017
Field of study

Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster

Crossref

Directory of Open Access Journals

UCL Discovery

Birkbeck Institutional Research Online

Cytokine responsive networks in human colonic epithelial organoids unveil a molecular classification of inflammatory bowel disease

Author: Bewick G
Cozzetto D
Friedman J
Hayee B
Korcsmaros TD
Li K
Niazi U
Pavlidis P
Powell N
Saqi M
Treveil A
Tsakmaki A
Yang F
Publication venue: 'Elsevier BV'
Publication date: 09/09/2022
Field of study

Interactions between the epithelium and the immune system are critical in the pathogenesis of inflammatory bowel disease (IBD). In this study, we mapped the transcriptional landscape of human colonic epithelial organoids in response to different cytokines responsible for mediating canonical mucosal immune responses. By profiling the transcriptome of human colonic organoids treated with the canonical cytokines interferon gamma, interleukin-13, -17A, and tumor necrosis factor alpha with next-generation sequencing, we unveil shared and distinct regulation patterns of epithelial function by different cytokines. An integrative analysis of cytokine responses in diseased tissue from patients with IBD (n = 1,009) reveals a molecular classification of mucosal inflammation defined by gradients of cytokine-responsive transcriptional signatures. Our systems biology approach detected signaling bottlenecks in cytokine-responsive networks and highlighted their translational potential as theragnostic targets in intestinal inflammation

Spiral - Imperial College Digital Repository

I-TASSER server for protein 3D structure prediction

Author: A Zemla
AA Canutescu
AG Murzin
B Wallner
CS Pettitt
D Baker
D Cozzetto
D Fischer
HM Berman
J Skolnick
JN Battey
K Ginalski
K Karplus
LE Reichl
M Feig
MR Betancourt
SB Needleman
SC Tosatto
SF Altschul
ST Wu
ST Wu
TF Smith
W Kabsch
Y Zhang
Y Zhang
Y Zhang
Y Zhang
Y Zhang
Y Zhang
Y Zhang
Yang Zhang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP) experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions. Results An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score) based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1]) of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 Å for RMSD. Conclusion The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available to the academic community at <url>http://zhang.bioinformatics.ku.edu/I-TASSER</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

KU ScholarWorks (Univ. of Kansas)

PubMed Central

Linear predictive coding representation of correlated mutation for protein sequence alignment

Author: A Elofsson
AG Murzin
AS Yang
BC Lee
Chan-seok Jeong
CM Buslje
D Cozzetto
Dongsup Kim
DT Jones
E Neher
ER Tillier
G Shackelford
GJ Bartlett
GM Süel
J Kleinjung
J Kopp
J Söding
JM Chandonia
JP Dekker
LR Rabiner
M Lee
N Siew
O Olmea
S Wu
SD Dunn
SF Altschul
SW Lockless
T Ohlson
T Pham
U Göbel
WR Atchley
Y Qi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

MACSIMS : multiple alignment of complete sequences information management system

BACKGROUND: In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. RESULTS: MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. CONCLUSION: MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL Descartes

University of Dundee Online Publications

Hal-Diderot

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Author: Alborzi S. Z.
Altenhoff A.
Amezola M.
Antczak M.
Aridhi S.
Asgari E.
Atalay V.
Babbitt P. C.
Barot M.
Ben-Hur A.
Benso A.
Bergquist T. R.
Berselli M.
Bhat P.
Bjorne J.
Black G. S.
Boecker F.
Bonneau R.
Borukhov I.
Bosco G.
Boudellioua I.
Brackenridge D. A.
Brenner S. E.
Cao R.
Carraro M.
Casadio R.
Cetin Atalay R.
Chandler C.
Chang J. -M.
Cheng J.
Chi P. -H.
Cozzetto D.
Crocker A. W.
Dai S.
Dalklran A.
Das S.
Davidovic R. S.
Davis L.
Dayton J. B.
Dessimoz C.
Devignes M. -D.
Di Carlo S.
Dogan T.
Dzeroski S.
Fa R.
Fabris F.
Falda M.
Fang H.
Fernandez J. M.
Fontana P.
Frank Y.
Frasca M.
Freddolino P. L.
Freitas A. A.
Friedberg I.
Gemovic B.
Georghiou G.
Ginter F.
Gligorijevic V.
Goldberg T.
Gough J.
Greene C. S.
Grossi G.
Hakala K.
Hamid M. N.
Hoehndorf R.
Hogan D. A.
Holm L.
Hou J.
Hurto R. L.
Jain A.
Jeffery C. J.
Jiang Y.
Jo D.
Johnson D.
Jones D. T.
Kacsoh B. Z.
Kaewphan S.
Kahanda I.
Kihara D.
Koo D. C. E.
Kulmanov M.
Larsen D. J.
Lavezzo E.
Lee A. J.
Lees J. G.
Lewis K. A.
Liao W. -H.
Lichtarge O.
Linial M.
Liu Y. -W.
Mao Q.
Martelli P. L.
Martin M. J.
McGuffin L. J.
McHardy A. C.
Medlar A. J.
Mehryary F.
Mesiti M.
Moen H.
Mofrad M. R. K.
Mooney S. D.
Nguyen H. N.
Notaro M.
Novikov I.
O'Donovan C.
Omdahl A. R.
Orengo C. A.
Paccanaro A.
Pascarelli S.
Perovic V. R.
Petrini A.
Piovesan D.
Politano G.
Profiti G.
Radivojac P.
Re M.
Reeb J.
Renaux A.
Rifaioglu A. S.
Ritchie D. W.
Roche D. B.
Rodriguez J. M.
Romero A. E.
Rose P. W.
Rost B.
Saidi R.
Salakoski T.
Savojardo C.
Schoof H.
Sillitoe I.
Smuc T.
Suh E.
Sumonja N.
Supek F.
Thurlby N.
Tian W.
Tolvanen M. E. E.
Toppo S.
Toronen P.
Torres M.
Tosatto S. C. E.
Tress M. L.
Tseng W. -C.
Ur Rehman H.
Valentini G.
Veljkovic N.
Vidulin V.
Vucetic S.
Wan C.
Wang Z.
Warwick Vesztrocy A.
Wass M. N.
Wilkins A.
Yang H.
Yao S.
You R.
Yunes J. M.
Zhang C.
Zhang F.
Zhang S.
Zhang Y.
Zhang Z.
Zhao C.
Zhou N.
Zhu S.
Zosa E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole genome mutation screening in Candida albicans and aeruginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

3D Profile-Based Approach to Proteome-Wide Discovery of Novel Human Chemokines

Author: A Bateman
A Gerber
A Tomczak
A Zlotnik
A Zlotnik
A Zlotnik
AA Maghazachi
Andrej Shevchenko
Aurelie Tomczak
B Rost
C Boshoff
C Gille
C Pasquier
CH Wu
CJ Sigrist
D Cozzetto
D Van Der Spoel
D Wan
David Drechsel
DT Jones
E Lindahl
EL Sonnhammer
F Cocchi
Frank Buchholz
G Magistrelli
G Wang
G Wang
HH de Jongh
I Letunic
I Poser
I Prudovsky
IW Chong
J Cheng
J Gough
J Schultz
J Wang
Jana Sontheimer
JD Bendtsen
JE Pease
JE Tabaska
JG Luz
JT Stine
K Hiller
K Ottersbach
KA Roebuck
Karim Fahmy
KY Blain
LN Kinch
M. Teresa Pisabarro
MA Marti-Renom
Marc Gentzel
MJ Betts
MJ Sippl
MJ Sippl
MJ Sippl
MT Pisabarro
O Shmueli
P Flicek
P Genin
P Horton
P Puntervoll
P Ruggiero
Paul Wrede
R Colobran
Rainer Hausdorf
RJ Nibbs
S Hunter
S Kumar
S Lata
SF Altschul
Stefanie Eichler
T Fujita
TT Murooka
U Widmer
W Humphrey
WF Van Gunsteren
Y Ueda
Z Johnson
Z Zhang
Publication venue: Public Library of Science
Publication date: 07/05/2012
Field of study

Chemokines are small secreted proteins with important roles in immune responses. They consist of a conserved three-dimensional (3D) structure, so-called IL8-like chemokine fold, which is supported by disulfide bridges characteristic of this protein family. Sequence- and profile-based computational methods have been proficient in discovering novel chemokines by making use of their sequence-conserved cysteine patterns. However, it has been recently shown that some chemokines escaped annotation by these methods due to low sequence similarity to known chemokines and to different arrangement of cysteines in sequence and in 3D. Innovative methods overcoming the limitations of current techniques may allow the discovery of new remote homologs in the still functionally uncharacterized fraction of the human genome. We report a novel computational approach for proteome-wide identification of remote homologs of the chemokine family that uses fold recognition techniques in combination with a scaffold-based automatic mapping of disulfide bonds to define a 3D profile of the chemokine protein family. By applying our methodology to all currently uncharacterized human protein sequences, we have discovered two novel proteins that, without having significant sequence similarity to known chemokines or characteristic cysteine patterns, show strong structural resemblance to known anti-HIV chemokines. Detailed computational analysis and experimental structural investigations based on mass spectrometry and circular dichroism support our structural predictions and highlight several other chemokine-like features. The results obtained support their functional annotation as putative novel chemokines and encourage further experimental characterization. The identification of remote homologs of human chemokines may provide new insights into the molecular mechanisms causing pathologies such as cancer or AIDS, and may contribute to the development of novel treatments. Besides, the genome-wide applicability of our methodology based on 3D protein family profiles may open up new possibilities for improving and accelerating protein function annotation processes

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute