4,441 research outputs found

    Chi-square-based scoring function for categorization of MEDLINE citations

    Full text link
    Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.Comment: 34 pages, 2 figure

    Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining

    Full text link
    Biomedical knowledge is growing in an astounding pace with a majority of this knowledge is represented as scientific publications. Text mining tools and methods represents automatic approaches for extracting hidden patterns and trends from this semi structured and unstructured data. In Biomedical Text mining, Literature Based Discovery (LBD) is the process of automatically discovering novel associations between medical terms otherwise mentioned in disjoint literature sets. LBD approaches proven to be successfully reducing the discovery time of potential associations that are hidden in the vast amount of scientific literature. The process focuses on creating concept profiles for medical terms such as a disease or symptom and connecting it with a drug and treatment based on the statistical significance of the shared profiles. This knowledge discovery approach introduced in 1989 still remains as a core task in text mining. Currently the ABC principle based two approaches namely open discovery and closed discovery are mostly explored in LBD process. This review starts with general introduction about text mining followed by biomedical text mining and introduces various literature resources such as MEDLINE, UMLS, MESH, and SemMedDB. This is followed by brief introduction of the core ABC principle and its associated two approaches open discovery and closed discovery in LBD process. This review also discusses the deep learning applications in LBD by reviewing the role of transformer models and neural networks based LBD models and its future aspects. Finally, reviews the key biomedical discoveries generated through LBD approaches in biomedicine and conclude with the current limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table

    Thirty years of artificial intelligence in medicine (AIME) conferences: A review of research themes

    Get PDF
    Over the past 30 years, the international conference on Artificial Intelligence in MEdicine (AIME) has been organized at different venues across Europe every 2 years, establishing a forum for scientific exchange and creating an active research community. The Artificial Intelligence in Medicine journal has published theme issues with extended versions of selected AIME papers since 1998

    Infectious Disease Ontology

    Get PDF
    Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain

    Semantic Approaches for Knowledge Discovery and Retrieval in Biomedicine

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationThe objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often overwhelmed by excessive, irrelevant data. Scientists have applied natural language processing technologies to improve retrieval. Text summarization, a natural language processing approach, simplifies bibliographic data while filtering it to address a user's need. Traditional text summarization can necessitate the use of multiple software applications to accommodate diverse processing refinements known as "points-of-view." A new, statistical approach to text summarization can transform this process. Combo, a statistical algorithm comprised of three individual metrics, determines which elements within input data are relevant to a user's specified information need, thus enabling a single software application to summarize text for many points-of-view. In this dissertation, I describe this algorithm, and the research process used in developing and testing it. Four studies comprised the research process. The goal of the first study was to create a conventional schema accommodating a genetic disease etiology point-of-view, and an evaluative reference standard. This was accomplished through simulating the task of secondary genetic database curation. The second study addressed the development iv and initial evaluation of the algorithm, comparing its performance to the conventional schema using the previously established reference standard, again within the task of secondary genetic database curation. The third and fourth studies evaluated the algorithm's performance in accommodating additional points-of-view in a simulated clinical decision support task. The third study explored prevention, while the fourth evaluated performance for prevention and drug treatment, comparing results to a conventional treatment schema's output. Both summarization methods identified data that were salient to their tasks. The conventional genetic disease etiology and treatment schemas located salient information for database curation and decision support, respectively. The Combo algorithm located salient genetic disease etiology, treatment, and prevention data, for the associated tasks. Dynamic text summarization could potentially serve additional purposes, such as consumer health information delivery, systematic review creation, and primary research. This technology may benefit many user groups
    • …
    corecore