74 research outputs found

    MeSH indexing based on automatically generated summaries

    Get PDF
    BACKGROUND: MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results. RESULTS: We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision. CONCLUSIONS: Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading

    The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey

    Get PDF
    Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, synonyms, abbreviations, or term mismatch. To address this, semantic search approaches employing predefined concepts with associated synonyms and relations have been used to expand query terms and improve information retrieval. The National Library of Medicine (NLM) plays a significant role in this area, indexing citations in the MEDLINE database with topic descriptors from the Medical Subject Headings (MeSH) thesaurus, enabling advanced semantic search strategies to retrieve relevant citations, despite synonymy, and polysemy of biomedical terms. Over time, advancements in semantic indexing have been made, with Machine Learning facilitating the transition from manual to automatic semantic indexing in the biomedical literature. The paper highlights the journey of this transition, starting with manual semantic indexing and the initial efforts toward automatic indexing. The BioASQ challenge has served as a catalyst in revolutionizing the domain of semantic indexing, further pushing the boundaries of efficient knowledge retrieval in the biomedical field

    Field Demonstration of Carbon Dioxide Miscible Flooding in the Lansing-Kansas City Formation, Central Kansas

    Get PDF
    A pilot carbon dioxide miscible flood was initiated in the Lansing Kansas City C formation in the Hall Gurney Field, Russell County, Kansas. The reservoir zone is an oomoldic carbonate located at a depth of about 2900 feet. The pilot consists of one carbon dioxide injection well and three production wells. Continuous carbon dioxide injection began on December 2, 2003. By the end of June 2005, 16.19 MM lb of carbon dioxide was injected into the pilot area. Injection was converted to water on June 21, 2005 to reduce operating costs to a breakeven level with the expectation that sufficient carbon dioxide was injected to displace the oil bank to the production wells by water injection. By March 7,2010, 8,736 bbl of oil were produced from the pilot. Production from wells to the northwest of the pilot region indicates that oil displaced from carbon dioxide injection was produced from Colliver A7, Colliver A3, Colliver A14 and Graham A4 located on adjacent leases. About 19,166 bbl of incremental oil were estimated to have been produced from these wells as of March 7, 2010. There is evidence of a directional permeability trend toward the NW through the pilot region. The majority of the injected carbon dioxide remains in the pilot region, which has been maintained at a pressure at or above the minimum miscibility pressure. Estimated oil recovery attributed to the CO2 flood is 27,902 bbl which is equivalent to a gross CO2 utilization of 4.8 MCF/bbl. The pilot project is not economic

    HPV-Related Nonkeratinizing Squamous Cell Carcinoma of the Oropharynx: Utility of Microscopic Features in Predicting Patient Outcome

    Get PDF
    Human papilloma virus (HPV) is an etiologic agent in a subset of oropharyngeal squamous cell carcinomas (SCCs). The aim of this study was to sub-classify SCC of the oropharynx based upon histologic features into nonkeratinizing (NK) SCC, keratinizing (K) SCC, and hybrid SCC, and determine the frequency of HPV and patient survival in each group. Patients with oropharyngeal SCC with a minimum of 2Ā years of clinical follow-up were identified from radiation oncology databases from 1997 to 2004. All patients received either up front surgery with postoperative radiation or definitive radiation based therapy. In situ hybridization (ISH) for high-risk HPV subtypes and immunohistochemistry for p16, a protein frequently up-regulated in HPV-associated carcinomas, were performed. Overall and disease-specific survival were assessed. Of 118 cases, 46.6% were NK SCC, 24.6% K SCC and 28.8% hybrid SCC. NK SCC occurred in slightly younger patients that were more often male. It more frequently presented with lymph node metastases and was surgically resected compared to K SCC. NK SCC was significantly more likely to be HPV and p16 positive than KSCC (PĀ <Ā 0.001) and to have better overall and disease-specific survival (PĀ =Ā 0.0002; PĀ =Ā 0.0142, respectively). Hybrid SCC was also more likely than K SCC to be HPV and p16 positive (PĀ =Ā 0.003; PĀ =Ā 0.002, respectively) and to have better overall survival (PĀ =Ā 0.0105). Sub-classification of oropharyngeal SCC by histologic type provides useful clinical information. NK SCC histology strongly predicts HPV-association and better patient survival compared to K SCC. Hybrid SCC appears to have an intermediate frequency of HPV-association and patient survival

    Automatic Indexing of Specialized Documents: Using Generic vs. Domain-Specific Document Representations

    No full text
    The shift from paper to electronic documents has caused the curation of information sources in large electronic databases to become more generalized. In the biomedical domain, continuing efforts aim at refining indexing tools to assist with the update and maintenance of databases such as MEDLINE Ā®. In this paper, we evaluate two statistical methods of producing MeSH Ā® indexing recommendations for the genetics literature, including recommendations involving subheadings, which is a novel application for the methods. We show that a generic representation of the documents yields both better precision and recall. We also find that a domainspecific representation of the documents can contribute to enhancing recall.
    • ā€¦
    corecore