Search CORE

7 research outputs found

Comparative analysis of protein function text-based embeddings and their applicability to prediction tasks - Poster

Author: Castro Leyla Jael
Hofmann-Apitius Martin
Ravinder Rohitha
Rebholz-Schuhmann Dietrich
Publication venue
Publication date: 01/01/2023
Field of study

Predicting protein function is a difficult problem in bioinformatics. Many recent techniques employ embeddings to learn representations of protein sequences and infer function from these; however there have been no studies that have utilized protein function text embeddings to forecast protein function. Here, we propose to learn and explore text-driven embedding representations of protein function comment sections kept as part of the Swiss-Prot entries and understand how the resulting data can be used to enhance protein function annotations. The comparative study is based on protein function text embeddings derived from two approaches which include a combination of natural language processing frameworks such as Word2Vec, Doc2Vec and dictionary-based Named Entity Recognition and acts as a preliminary assessment based on direct propagation techniques such as sequence similarity and by-similarity prediction

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Designing a FAIRification game for Research Software

Author: Carta Claudio
Castro Leyla Jael
dos Santos Vieira Bruna
Keller Johannes
Ravinder Rohitha
Roos Marco
Solanki Dhwani
Publication venue
Publication date: 01/01/2024
Field of study

FAIRification games are training tools used to raise awareness around the FAIR principles. They offer a low barrier entrance by providing a gamification approach where participants play to solve an overall goal. Building on top of the FAIRification Game for Rare Disease Data, here we present our initial considerations for a FAIRification game for Research Software

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

A Comparison of Vector-based Approaches for Document Similarity Using the RELISH Corpus

Author: Castro Leyla Jael
Dadi Vishnu
Fellerhof Tim
Geist Lukas
Ravinder Rohitha
Rebholz-Schuhmann Dietrich
Rocamora Guillermo
Talha Muhammad
Publication venue: Hochschule Bonn-Rhein-Sieg
Publication date: 29/08/2023
Field of study

The continuously increasing number of biomedical scholarly publications makes it challenging to construct document recommendation algorithms that can efficiently navigate through literature. Such algorithms would help researchers in finding similar, relevant, and related publications that align with their research interests. Natural Language Processing offers various alternatives to compare publications, ranging from entity recognition to document embeddings. In this paper, we present the results of a comparative analysis of vector-based approaches to assess document similarity in the RELISH corpus. We aim to determine the best approach that resembles relevance without the need for further training. Specifically, we employ five different techniques to generate vectors representing the text in the documents. These techniques employ a combination of various Natural Language Processing frameworks such as Word2Vec, Doc2Vec, dictionary-based Named Entity Recognition, and state-of-the-art models based on BERT. To evaluate the document similarity obtained by these approaches, we utilize different evaluation metrics that account for relevance judgment, relevance search, and re-ranking of the relevance search. Our results demonstrate that the most promising approach is an in-house version of document embeddings, starting with word embeddings and using centroids to aggregate them by document

pub H-BRS - Publikationsserver der Hochschule Bonn-Rhein-Sieg

OntoClue, a framework to compare vector-based approaches for document relatedness using the RELISH corpus - Poster

Author: Castro Leyla Jael
Dadi Vishnu Vardhan
Fellerhoff Tim
Geist Lukas
Ravinder Rohitha
Rebholz-Schuhmann Dietrich
Rocamora Guillermo
Talha Muhammad
Publication venue
Publication date: 01/01/2023
Field of study

The continuous increase of biomedical scholarly publications makes it challenging to construct document recommendation algorithms to navigate through literature, an important feature for researchers to keep up with relevant publications. Understanding semantic relatedness and similarity between two documents could improve document recommendations. The objective of this study is performing a comparative analysis of vector-based approaches to assess document similarity in the RELISH corpus. Here we present our approach to compare five different techniques to generate vectors representing the text in the documents. These techniques employ a combination of various Natural Language Processing frameworks such as Word2Vec, Doc2Vec, dictionary-based Named Entity Recognition as well as state-of-the-art models based on BERT

pub H-BRS - Publikationsserver der Hochschule Bonn-Rhein-Sieg

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

BioHackEU23 report: Enabling FAIR Digital Objects with RO-Crate, Signposting and Bioschemas

Author: Blanchi Christophe
Castro Leyla Jael
Grieb Jonas
Ravinder Rohitha
Rogers Alexander
Soiland-Reyes Stian
Van de Sompel Herbert
Weiland Claus
Publication venue: BioHackrXiv
Publication date: 09/01/2024
Field of study

As part of the BioHackathon Europe 2023, we here report from the progress of the hackathon project #15: "Enabling FAIR Digital Objects with RO-Crate, Signposting and Bioschemas". We added Signposting to three existing resources, and made a Chrome browser extension to show Signposting headers. We added RO-Crate to two existing resources, and explored making a hybrid FDO using both a Handle PID Record and Signposting/RO-Crate approach

The University of Manchester - Institutional Repository

Document-to-document relevance assessment for TREC Genomics Track 2005

Author: Cadena María Fernanda
Castro Leyla Jael
Fellerhoff Tim
Geist Lukas
Giraldo Olga
Ravinder Rohitha
Rebholz-Schuhmann Dietrich
Robayo-Gama Andrea
Solanki Dhwani
Talha Muhammad
Publication venue: Hochschule Bonn-Rhein-Sieg
Publication date: 22/06/2023
Field of study

Here we present a doc-2-doc relevance assessment performed on a subset of the TREC Genomics Track 2005 collection. Our approach includes an experimental set up to manually assess doc-2-doc relevance and the corresponding analysis done on the results obtained from this experiment. The experiment takes one document as a reference and assesses a second document regarding its relevance to the reference one. The consistency of the assessments done by 4 domain experts was evaluated. The lack of agreement between annotators may be due to: i) The abstract lacks key information and/or ii) Lack of experience of the annotators in the evaluation of some topics

pub H-BRS - Publikationsserver der Hochschule Bonn-Rhein-Sieg

Comparative analysis of protein function text-based embeddings and its potential for prediction tasks

Author: Bonn Aachen International Center für Information Technology
Castro Leyla Jael
Hofmann-Apitius Martin
Ravinder Rohitha
Rebholz-Schuhmann Dietrich
ZB MED - Informationszentrum Lebenswissenschaften
Publication venue
Publication date
Field of study

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)