5 research outputs found

    Exploración de métodos de clasificación de proteínas repetidas basado en su información estructural utilizando aprendizaje de máquina

    Get PDF
    En la actualidad, existen métodos complejos para la clasificación e identificación de proteínas repetidas a partir de su estructura, los cuales implican un uso intenso y costoso de recursos computacionales. Debido a ello, en el presente trabajo de investigación se busca explorar soluciones alternativas y complementarias a otros sistemas en la etapa de clasificación de proteínas repetidas con técnicas del área de estudio de aprendizaje de máquina. Estas técnicas son conocidas por ser efectivas y rápidas para la sistematización de varios procedimientos de clasificación, segmentación y transformación de datos con la condición de que se disponga de una cantidad considerable de datos. De esa forma, en consecuencia de la cantidad de datos estructurales que se han generado en los últimos años en el ambito de las proteínas y las proteínas repetidas, es posible utilizar técnicas de aprendizaje de máquina para la clasificación de las mismas. Por ello, en este trabajo, a partir de un análisis a los datos que se poseen en la actualidad y una revisión sistemática de la literatura, se proponen posibles soluciones que utilizan aprendizaje de máquina para la clasificación automatizada y rápida de proteínas repetidas a partir de su estructura. De estas posibles soluciones, se concluye que es posible la implementación de un clasificador con múltiples entradas utilizando información de los ángulos de torsión y distancia entre aminoácidos de una proteína, la cual va a ser implementada y evaluada en un trabajo futuro.Trabajo de investigació

    3D-Beacons: decreasing the gap between protein sequences and structures through a federated network of protein structure data resources

    Get PDF
    While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank

    PED in 2024: improving the community deposition of structural ensembles for intrinsically disordered proteins

    No full text
    The Protein Ensemble Database (PED) (URL: https://proteinensemble.org) is the primary resource for depositing structural ensembles of intrinsically disordered proteins. This updated version of PED reflects advancements in the field, denoting a continual expansion with a total of 461 entries and 538 ensembles, including those generated without explicit experimental data through novel machine learning (ML) techniques. With this significant increment in the number of ensembles, a few yet-unprecedented new entries entered the database, including those also determined or refined by electron paramagnetic resonance or circular dichroism data. In addition, PED was enriched with several new features, including a novel deposition service, improved user interface, new database cross-referencing options and integration with the 3D-Beacons network-all representing efforts to improve the FAIRness of the database. Foreseeably, PED will keep growing in size and expanding with new types of ensembles generated by accurate and fast ML-based generative models and coarse-grained simulations. Therefore, among future efforts, priority will be given to further develop the database to be compatible with ensembles modeled at a coarse-grained level.Graphical Abstrac

    DisProt in 2024: improving function annotation of intrinsically disordered proteins

    No full text
    DisProt (URL: https://disprot.org) is the gold standard database for intrinsically disordered proteins and regions, providing valuable information about their functions. The latest version of DisProt brings significant advancements, including a broader representation of functions and an enhanced curation process. These improvements aim to increase both the quality of annotations and their coverage at the sequence level. Higher coverage has been achieved by adopting additional evidence codes. Quality of annotations has been improved by systematically applying Minimum Information About Disorder Experiments (MIADE) principles and reporting all the details of the experimental setup that could potentially influence the structural state of a protein. The DisProt database now includes new thematic datasets and has expanded the adoption of Gene Ontology terms, resulting in an extensive functional repertoire which is automatically propagated to UniProtKB. Finally, we show that DisProt's curated annotations strongly correlate with disorder predictions inferred from AlphaFold2 pLDDT (predicted Local Distance Difference Test) confidence scores. This comparison highlights the utility of DisProt in explaining apparent uncertainty of certain well-defined predicted structures, which often correspond to folding-upon-binding fragments. Overall, DisProt serves as a comprehensive resource, combining experimental evidence of disorder information to enhance our understanding of intrinsically disordered proteins and their functional implications.Fil: Aspromonte, Maria Cristina. Università di Padova; ItaliaFil: Nugnes, María Victoria. Università di Padova; Italia. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Quaglia, Federica. Università di Padova; ItaliaFil: Bouharoua, Adel. Università di Padova; ItaliaFil: Sagris, Vasileios. UNIVERSITY OF CYPRUS (UC);Fil: Promponas, Vasilis J.. UNIVERSITY OF CYPRUS (UC);Fil: Chasapi, Anastasia. Centre For Research And Technology - Hellas ; Chemical Process & Energy Resources Institute;Fil: Fichó, Erzsébet. Cytocast Hungary; HungríaFil: Balatti, Galo Ezequiel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Parisi, Gustavo Daniel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: González Buitrón, Martín. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Erdos, Gabor. Eötvös University; ArgentinaFil: Pajkos, Matyas. Eotvos University, Budapest. Department Of Materials Physics; ArgentinaFil: Dosztányi, Zsuzsanna. Eotvos University, Budapest. Department Of Materials Physics; ArgentinaFil: Dobson, Laszlo. Semmelweis University; HungríaFil: Conte, Alessio Del. Università di Padova; ItaliaFil: Clementel, Damiano. Università di Padova; ItaliaFil: Salladini, Edoardo. Università di Padova; ItaliaFil: DisProt Consortium. Università di Padova; ItaliaFil: Ku, Luiggi G Tenorio. Università di Padova; ItaliaFil: Monzon, Alexander Miguel. Università di Padova; ItaliaFil: Tompa, Peter. Vrije Unviversiteit Brussel; BélgicaFil: Lazar, Tamas. Vrije Unviversiteit Brussel; BélgicaFil: Tosatto, Silvio C E. Università di Padova; ItaliaFil: Piovesan, Damiano. Università di Padova; Itali
    corecore