4,568 research outputs found

    ChemTextMiner: An open source tool kit for mining medical literature abstracts

    Get PDF
    Text mining involves recognizing patterns from a wealth of information hidden latent in unstructured text and deducing explicit relationships among data entities by using data mining tools. Text mining of Biomedical literature is essential for building biological network connecting genes, proteins, drugs, therapeutic categories, side effects etc. related to diseases of interest. We present an approach for textmining biomedical literature mostly in terms of not so obvious hidden relationships and build biological network applied for the textmining of important human diseases like MTB, Malaria, Alzheimer and Diabetes. The methods, tools and data used for building biological networks using a distributed computing environment previously used for ChemXtreme[1] and ChemStar[2] applications are also described

    Drug prescription support in dental clinics through drug corpus mining

    Get PDF
    The rapid increase in the volume and variety of data poses a challenge to safe drug prescription for the dentist. The increasing number of patients that take multiple drugs further exerts pressure on the dentist to make the right decision at point-of-care. Hence, a robust decision support system will enable dentists to make decisions on drug prescription quickly and accurately. Based on the assumption that similar drug pairs have a higher similarity ratio, this paper suggests an innovative approach to obtain the similarity ratio between the drug that the dentist is going to prescribe and the drug that the patient is currently taking. We conducted experiments to obtain the similarity ratios of both positive and negative drug pairs, by using feature vectors generated from term similarities and word embeddings of biomedical text corpus. This model can be easily adapted and implemented for use in a dental clinic to assist the dentist in deciding if a drug is suitable for prescription, taking into consideration the medical profile of the patients. Experimental evaluation of our model’s association of the similarity ratio between two drugs yielded a superior F score of 89%. Hence, such an approach, when integrated within the clinical work flow, will reduce prescription errors and thereby increase the health outcomes of patients

    Modeling text with generalizable Gaussian mixtures

    Get PDF
    We apply and discuss generalizable Gaussian mixture (GGM) models for textmining. The model automatically adapts model complexity for a given text representation. We show that the generalizability of these models depends on the dimensionality of the representation and the sample size. We discuss the relation between supervised and unsupervised learning in text data. Finally, we implement a novelty detector based on the density model. 1. INTRODUCTION Information retrieval is a very active research field which is starting to adapt advanced machine learning techniques for solving hard real world problems [17, 18]. Textmining or pattern recognition in text data is used to categorize text according to topic, to spot new topics, and in a broader sense to create more intelligent searches, e.g., by WWW search engines [12, ?, 14]. Textmining proceeds by pattern recognition based on text features, typically document summary statistics. While there are numerous highlevel language models for extr..

    Topic Map Generation Using Text Mining

    Get PDF
    Starting from text corpus analysis with linguistic and statistical analysis algorithms, an infrastructure for text mining is described which uses collocation analysis as a central tool. This text mining method may be applied to different domains as well as languages. Some examples taken form large reference databases motivate the applicability to knowledge management using declarative standards of information structuring and description. The ISO/IEC Topic Map standard is introduced as a candidate for rich metadata description of information resources and it is shown how text mining can be used for automatic topic map generation

    Doing Things Twice (Or Differently): Strategies to Identify Studies for Targeted Validation

    Full text link
    The "reproducibility crisis" has been a highly visible source of scientific controversy and dispute. Here, I propose and review several avenues for identifying and prioritizing research studies for the purpose of targeted validation. Of the various proposals discussed, I identify scientific data science as being a strategy that merits greater attention among those interested in reproducibility. I argue that the tremendous potential of scientific data science for uncovering high-value research studies is a significant and rarely discussed benefit of the transition to a fully open-access publishing model.Comment: 4 page

    Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy

    Get PDF
    Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians. © 2006Bekhuis; licensee BioMed Central Ltd

    Knowledge Organization Research in the last two decades: 1988-2008

    Get PDF
    We apply an automatic topic mapping system to records of publications in knowledge organization published between 1988-2008. The data was collected from journals publishing articles in the KO field from Web of Science database (WoS). The results showed that while topics in the first decade (1988-1997) were more traditional, the second decade (1998-2008) was marked by a more technological orientation and by the appearance of more specialized topics driven by the pervasiveness of the Web environment
    corecore