73 research outputs found

    MeSH indexing based on automatically generated summaries

    Get PDF
    BACKGROUND: MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results. RESULTS: We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision. CONCLUSIONS: Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading

    The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey

    Get PDF
    Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, synonyms, abbreviations, or term mismatch. To address this, semantic search approaches employing predefined concepts with associated synonyms and relations have been used to expand query terms and improve information retrieval. The National Library of Medicine (NLM) plays a significant role in this area, indexing citations in the MEDLINE database with topic descriptors from the Medical Subject Headings (MeSH) thesaurus, enabling advanced semantic search strategies to retrieve relevant citations, despite synonymy, and polysemy of biomedical terms. Over time, advancements in semantic indexing have been made, with Machine Learning facilitating the transition from manual to automatic semantic indexing in the biomedical literature. The paper highlights the journey of this transition, starting with manual semantic indexing and the initial efforts toward automatic indexing. The BioASQ challenge has served as a catalyst in revolutionizing the domain of semantic indexing, further pushing the boundaries of efficient knowledge retrieval in the biomedical field

    HPV-Related Nonkeratinizing Squamous Cell Carcinoma of the Oropharynx: Utility of Microscopic Features in Predicting Patient Outcome

    Get PDF
    Human papilloma virus (HPV) is an etiologic agent in a subset of oropharyngeal squamous cell carcinomas (SCCs). The aim of this study was to sub-classify SCC of the oropharynx based upon histologic features into nonkeratinizing (NK) SCC, keratinizing (K) SCC, and hybrid SCC, and determine the frequency of HPV and patient survival in each group. Patients with oropharyngeal SCC with a minimum of 2Ā years of clinical follow-up were identified from radiation oncology databases from 1997 to 2004. All patients received either up front surgery with postoperative radiation or definitive radiation based therapy. In situ hybridization (ISH) for high-risk HPV subtypes and immunohistochemistry for p16, a protein frequently up-regulated in HPV-associated carcinomas, were performed. Overall and disease-specific survival were assessed. Of 118 cases, 46.6% were NK SCC, 24.6% K SCC and 28.8% hybrid SCC. NK SCC occurred in slightly younger patients that were more often male. It more frequently presented with lymph node metastases and was surgically resected compared to K SCC. NK SCC was significantly more likely to be HPV and p16 positive than KSCC (PĀ <Ā 0.001) and to have better overall and disease-specific survival (PĀ =Ā 0.0002; PĀ =Ā 0.0142, respectively). Hybrid SCC was also more likely than K SCC to be HPV and p16 positive (PĀ =Ā 0.003; PĀ =Ā 0.002, respectively) and to have better overall survival (PĀ =Ā 0.0105). Sub-classification of oropharyngeal SCC by histologic type provides useful clinical information. NK SCC histology strongly predicts HPV-association and better patient survival compared to K SCC. Hybrid SCC appears to have an intermediate frequency of HPV-association and patient survival

    Automatic Indexing of Specialized Documents: Using Generic vs. Domain-Specific Document Representations

    No full text
    The shift from paper to electronic documents has caused the curation of information sources in large electronic databases to become more generalized. In the biomedical domain, continuing efforts aim at refining indexing tools to assist with the update and maintenance of databases such as MEDLINE Ā®. In this paper, we evaluate two statistical methods of producing MeSH Ā® indexing recommendations for the genetics literature, including recommendations involving subheadings, which is a novel application for the methods. We show that a generic representation of the documents yields both better precision and recall. We also find that a domainspecific representation of the documents can contribute to enhancing recall.
    • ā€¦
    corecore