9 research outputs found

    The MeSH-gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for UMLS Semantic Similarity and Relatedness in the Biomedical Domain

    Full text link
    Eliciting semantic similarity between concepts in the biomedical domain remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they risen to efficiently capture semantic relationships The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model named MeSH-gram which relies on a straighforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead words. Trained on publicly available corpus PubMed MEDLINE, MeSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows contexts. A deeper comparison is performed with tewenty existing models. All the obtained results of Spearman's rank correlations between human scores and computed similarities show that MeSH-gram outperforms the skip-gram model, and is comparable to the best methods but that need more computation and external resources.Comment: 6 pages, 2 table

    Etude et Evaluation d'Approches Multiples d'Expansion de Requêtes pour une Recherche d'Information Intelligente en Santé

    No full text
    La problématique de nos travaux de recherche se place dans le contexte de la RI textuelle sur le Web. Nous proposons en ce sens des méthodes de RI basées sur l'exploitation de connaissances. Nos expérimentations sont réalisées dans le cadre du projet CISMeF qui indexe un grand nombre de documents en focntion d'une ressource Termino-Ontologique du domaine médical. Nous avons développé le prototype KnowQUE (Knowledge Based QUery Expansion) pour corriger, préciser et enrichir les requêtes des utilisateurs. Ses modules exploitent les traitement linguistiques, la fouille de données et les mécanismes de raisonnement associés aux logiques de description.This thesis deals with the problemn of Information Retrieval on the Internet. We propose several methods of query expansion founded on knowledge exploitation. Experimentations are done in the context of the CISMeF project which indexes several documents according to a termino-ontological resource of the medical domain. We developed the KnowQUE tool (Knowledge Based QUery Expansion) tfor correcting, precising and enriching users' queries. All its modules exploit natural language processing, data mining and reasoning services associated to description logics

    The MeSH-gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for UMLS Semantic Similarity and Relatedness in the Biomedical Domain

    No full text
    6 pages, 2 tablesEliciting semantic similarity between concepts in the biomedical domain remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they risen to efficiently capture semantic relationships The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model named MeSH-gram which relies on a straighforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead words. Trained on publicly available corpus PubMed MEDLINE, MeSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows contexts. A deeper comparison is performed with tewenty existing models. All the obtained results of Spearman's rank correlations between human scores and computed similarities show that MeSH-gram outperforms the skip-gram model, and is comparable to the best methods but that need more computation and external resources

    The MeSH-gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for UMLS Semantic Similarity and Relatedness in the Biomedical Domain

    No full text
    6 pages, 2 tablesEliciting semantic similarity between concepts in the biomedical domain remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they risen to efficiently capture semantic relationships The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model named MeSH-gram which relies on a straighforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead words. Trained on publicly available corpus PubMed MEDLINE, MeSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows contexts. A deeper comparison is performed with tewenty existing models. All the obtained results of Spearman's rank correlations between human scores and computed similarities show that MeSH-gram outperforms the skip-gram model, and is comparable to the best methods but that need more computation and external resources

    Biomedical Concepts Extraction Based on Possibilistic Network and Vector Space Model

    No full text
    International audienceThis paper proposes a new approach for indexing biomedical documents based on the combination of a Possibilistic Network and a Vector Space Model. This later carries out partial matching between documents and biomedical vocabularies. The main contribution of the proposed approach is to combine the cosine similarity and the two measures of possibility and necessity to enhance the estimation of the similarity between a document and a given concept. The possibility estimates the extent to which a document is not similar to the concept. The necessity allows the confirmation that the document is similar to the concept. Experiments were carried out on the OSHUMED corpora and showed encouraging results

    Indexing biomedical documents with a possibilistic network

    No full text
    International audienceIn this article, we propose a new approach for indexing biomedical documents based on a possibilistic network that carries out partial matching between documents and biomedical vocabulary. The main contribution of our approach is to deal with the imprecision and uncertainty of the indexing task using possibility theory. We enhance estimation of the similarity between a document and a given concept using the two measures of possibility and necessity. Possibility estimates the extent to which a document is not similar to the concept. The second measure can provide confirmation that the document is similar to the concept. Our contribution also reduces the limitation of partial matching. Although the latter allows extracting from the document other variants of terms than those in dictionaries, it also generates irrelevant information. Our objective is to filter the index using the knowledge provided by the Unified Medical Language System®. Experiments were carried out on different corpora, showing encouraging results (the improvement rate is +26.37% in terms of main average precision when compared with the baseline)

    A Search Engine to Access PubMed Monolingual Subsets: Proof of Concept and Evaluation in French

    Get PDF
    International audienceBackground: PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing.Objective: The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French).Methods: To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy.Results: More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French.Conclusions: It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese
    corecore