14 research outputs found

    Automatic construction of rule-based ICD-9-CM coding systems

    Get PDF
    Background: In this paper we focus on the problem of automatically constructing ICD-9-CM coding systems for radiology reports. ICD-9-CM codes are used for billing purposes by health institutes and are assigned to clinical records manually following clinical treatment. Since this labeling task requires expert knowledge in the field of medicine, the process itself is costly and is prone to errors as human annotators have to consider thousands of possible codes when assigning the right ICD-9-CM labels to a document. In this study we use the datasets made available for training and testing automated ICD-9-CM coding systems by the organisers of an International Challenge on Classifying Clinical Free Text Using Natural Language Processing in spring 2007. The challenge itself was dominated by entirely or partly rule-based systems that solve the coding task using a set of hand crafted expert rules. Since the feasibility of the construction of such systems for thousands of ICD codes is indeed questionable, we decided to examine the problem of automatically constructing similar rule sets that turned out to achieve a remarkable accuracy in the shared task challenge. Results: Our results are very promising in the sense that we managed to achieve comparable results with purely hand-crafted ICD-9-CM classifiers. Our best model got a 90.26 % F measure on the training dataset and an 88.93 % F measure on the challenge test dataset, using the micro-averaged Fβ=1 measure, the official evaluatio

    MBA: a literature mining system for extracting biomedical abbreviations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The exploding growth of the biomedical literature presents many challenges for biological researchers. One such challenge is from the use of a great deal of abbreviations. Extracting abbreviations and their definitions accurately is very helpful to biologists and also facilitates biomedical text analysis. Existing approaches fall into four broad categories: rule based, machine learning based, text alignment based and statistically based. State of the art methods either focus exclusively on acronym-type abbreviations, or could not recognize rare abbreviations. We propose a systematic method to extract abbreviations effectively. At first a scoring method is used to classify the abbreviations into acronym-type and non-acronym-type abbreviations, and then their corresponding definitions are identified by two different methods: text alignment algorithm for the former, statistical method for the latter.</p> <p>Results</p> <p>A literature mining system MBA was constructed to extract both acronym-type and non-acronym-type abbreviations. An abbreviation-tagged literature corpus, called Medstract gold standard corpus, was used to evaluate the system. MBA achieved a recall of 88% at the precision of 91% on the Medstract gold-standard EVALUATION Corpus.</p> <p>Conclusion</p> <p>We present a new literature mining system MBA for extracting biomedical abbreviations. Our evaluation demonstrates that the MBA system performs better than the others. It can identify the definition of not only acronym-type abbreviations including a little irregular acronym-type abbreviations (e.g., <CNS1, cyclophilin seven suppressor>), but also non-acronym-type abbreviations (e.g., <Fas, CD95>).</p

    EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts

    Get PDF
    BACKGROUND: A better understanding of the mechanisms of an enzyme's functionality and stability, as well as knowledge and impact of mutations is crucial for researchers working with enzymes. Though, several of the enzymes' databases are currently available, scientific literature still remains at large for up-to-date source of learning the effects of a mutation on an enzyme. However, going through vast amounts of scientific documents to extract the information on desired mutation has always been a time consuming process. In this paper, therefore, we describe an unique method, termed as EnzyMiner, which automatically identifies the PubMed abstracts that contain information on the impact of a protein level mutation on the stability and/or the activity of a given enzyme. RESULTS: We present an automated system which identifies the abstracts that contain an amino-acid-level mutation and then classifies them according to the mutation's effect on the enzyme. In the case of mutation identification, MuGeX, an automated mutation-gene extraction system has an accuracy of 93.1% with a 91.5 F-measure. For impact analysis, document classification is performed to identify the abstracts that contain a change in enzyme's stability or activity resulting from the mutation. The system was trained on lipases and tested on amylases with an accuracy of 85%. CONCLUSION: EnzyMiner identifies the abstracts that contain a protein mutation for a given enzyme and checks whether the abstract is related to a disease with the help of information extraction and machine learning techniques. For disease related abstracts, the mutation list and direct links to the abstracts are retrieved from the system and displayed on the Web. For those abstracts that are related to non-diseases, in addition to having the mutation list, the abstracts are also categorized into two groups. These two groups determine whether the mutation has an effect on the enzyme's stability or functionality followed by displaying these on the web

    Participant recruitment and retention in a pilot program to prevent weight gain in low-income overweight and obese mothers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recruitment and retention are key functions for programs promoting nutrition and other lifestyle behavioral changes in low-income populations. This paper describes strategies for recruitment and retention and presents predictors of early (two-month post intervention) and late (eight-month post intervention) dropout (non retention) and overall retention among young, low-income overweight and obese mothers participating in a community-based randomized pilot trial called <it>Mothers In Motion</it>.</p> <p>Methods</p> <p>Low-income overweight and obese African American and white mothers ages 18 to 34 were recruited from the Special Supplemental Nutrition Program for Women, Infants, and Children in southern Michigan. Participants (n = 129) were randomly assigned to an intervention (n = 64) or control (n = 65) group according to a stratification procedure to equalize representation in two racial groups (African American and white) and three body mass index categories (25.0-29.9 kg/m<sup>2</sup>, 30.0-34.9 kg/m<sup>2</sup>, and 35.0-39.9 kg/m<sup>2</sup>). The 10-week theory-based culturally sensitive intervention focused on healthy eating, physical activity, and stress management messages that were delivered via an interactive DVD and reinforced by five peer-support group teleconferences. Forward stepwise multiple logistic regression was performed to examine whether dietary fat, fruit and vegetable intake behaviors, physical activity, perceived stress, positive and negative affect, depression, and race predicted dropout as data were collected two-month and eight-month after the active intervention phase.</p> <p>Results</p> <p>Trained personnel were successful in recruiting subjects. Increased level of depression was a predictor of early dropout (odds ratio = 1.04; 95% CI = 1.00, 1.08; p = 0.03). Greater stress predicted late dropout (odds ratio = 0.20; 95% CI = 0.00, 0.37; p = 0.01). Dietary fat, fruit, and vegetable intake behaviors, physical activity, positive and negative affect, and race were not associated with either early or late dropout. Less negative affect was a marginal predictor of participant retention (odds ratio = 0.57; 95% CI = 0.31, 1.03; p = 0.06).</p> <p>Conclusion</p> <p>Dropout rates in this study were higher for participants who reported higher levels of depression and stress.</p> <p>Trial registration</p> <p>Current Controlled Trials NCT00944060</p

    Clustering-Based Topic Identification of Transcribed Arabic Broadcast News

    No full text
    corecore