196 research outputs found

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Automatic construction of rule-based ICD-9-CM coding systems

    Get PDF
    Background: In this paper we focus on the problem of automatically constructing ICD-9-CM coding systems for radiology reports. ICD-9-CM codes are used for billing purposes by health institutes and are assigned to clinical records manually following clinical treatment. Since this labeling task requires expert knowledge in the field of medicine, the process itself is costly and is prone to errors as human annotators have to consider thousands of possible codes when assigning the right ICD-9-CM labels to a document. In this study we use the datasets made available for training and testing automated ICD-9-CM coding systems by the organisers of an International Challenge on Classifying Clinical Free Text Using Natural Language Processing in spring 2007. The challenge itself was dominated by entirely or partly rule-based systems that solve the coding task using a set of hand crafted expert rules. Since the feasibility of the construction of such systems for thousands of ICD codes is indeed questionable, we decided to examine the problem of automatically constructing similar rule sets that turned out to achieve a remarkable accuracy in the shared task challenge. Results: Our results are very promising in the sense that we managed to achieve comparable results with purely hand-crafted ICD-9-CM classifiers. Our best model got a 90.26 % F measure on the training dataset and an 88.93 % F measure on the challenge test dataset, using the micro-averaged Fβ=1 measure, the official evaluatio

    Increasing recruitment to randomised trials: a review of randomised controlled trials

    Get PDF
    BACKGROUND: Poor recruitment to randomised controlled trials (RCTs) is a widespread and important problem. With poor recruitment being such an important issue with respect to the conduct of randomised trials, a systematic review of controlled trials on recruitment methods was undertaken in order to identify strategies that are effective. METHODS: We searched the register of trials in Cochrane library from 1996 to end of 2004. We also searched Web of Science for 2004. Additional trials were identified from personal knowledge. Included studies had to use random allocation and participants had to be allocated to different methods of recruitment to a 'real' randomised trial. Trials that randomised participants to 'mock' trials and trials of recruitment to non-randomised studies (e.g., case control studies) were excluded. Information on the study design, intervention and control, and number of patients recruited was extracted by the 2 authors. RESULTS: We identified 14 papers describing 20 different interventions. Effective interventions included: telephone reminders; questionnaire inclusion; monetary incentives; using an 'open' rather than placebo design; and making trial materials culturally sensitive. CONCLUSION: Few trials have been undertaken to test interventions to improve trial recruitment. There is an urgent need for more RCTs of recruitment strategies

    Accreditation Standard Guideline Initiative for Tai Chi and Qigong Instructors and Training Institutions.

    Full text link
    Evidence of the health and wellbeing benefits of Tai Chi and Qigong (TQ) have emerged in the past two decades, but TQ is underutilized in modern health care in Western countries due to lack of promotion and the availability of professionally qualified TQ instructors. To date, there are no government regulations for TQ instructors or for training institutions in China and Western countries, even though TQ is considered to be a part of Traditional Chinese medicine that has the potential to manage many chronic diseases. Based on an integrative health care approach, the accreditation standard guideline initiative for TQ instructors and training institutions was developed in collaboration with health professionals, integrative medicine academics, Tai Chi and Qigong master instructors and consumers including public safety officers from several countries, such as Australia, Canada, China, Germany, Italy, Korea, Sweden and USA. In this paper, the rationale for organizing the Medical Tai Chi and Qigong Association (MTQA) is discussed and the accreditation standard guideline for TQ instructors and training institutions developed by the committee members of MTQA is presented. The MTQA acknowledges that the proposed guidelines are broad, so that the diversity of TQ instructors and training institutions can be integrated with recognition that these guidelines can be developed with further refinement. Additionally, these guidelines face challenges in understanding the complexity of TQ associated with different principles, philosophies and schools of thought. Nonetheless, these guidelines represent a necessary first step as primary resource to serve and guide health care professionals and consumers, as well as the TQ community

    Beyond chance? The persistence of performance in online poker

    Get PDF
    A major issue in the widespread controversy about the legality of poker and the appropriate taxation of winnings is whether poker should be considered a game of skill or a game of chance. To inform this debate we present an analysis into the role of skill in the performance of online poker players, using a large database with hundreds of millions of player-hand observations from real money ring games at three different stakes levels. We find that players whose earlier profitability was in the top (bottom) deciles perform better (worse) and are substantially more likely to end up in the top (bottom) performance deciles of the following time period. Regression analyses of performance on historical performance and other skill-related proxies provide further evidence for persistence and predictability. Simulations point out that skill dominates chance when performance is measured over 1,500 or more hands of play

    Automatic medical encoding with SNOMED categories

    Get PDF
    BACKGROUND: In this paper, we describe the design and preliminary evaluation of a new type of tools to speed up the encoding of episodes of care using the SNOMED CT terminology. METHODS: The proposed system can be used either as a search tool to browse the terminology or as a categorization tool to support automatic annotation of textual contents with SNOMED concepts. The general strategy is similar for both tools and is based on the fusion of two complementary retrieval strategies with thesaural resources. The first classification module uses a traditional vector-space retrieval engine which has been fine-tuned for the task, while the second classifier is based on regular variations of the term list. For evaluating the system, we use a sample of MEDLINE. SNOMED CT categories have been restricted to Medical Subject Headings (MeSH) using the SNOMED-MeSH mapping provided by the UMLS (version 2006). RESULTS: Consistent with previous investigations applied on biomedical terminologies, our results show that performances of the hybrid system are significantly improved as compared to each single module. For top returned concepts, a precision at high ranks (P0) of more than 80% is observed. In addition, a manual and qualitative evaluation on a dozen of MEDLINE abstracts suggests that SNOMED CT could represent an improvement compared to existing medical terminologies such as MeSH. CONCLUSION: Although the precision of the SNOMED categorizer seems sufficient to help professional encoders, it is concluded that clinical benchmarks as well as usability studies are needed to assess the impact of our SNOMED encoding method in real settings. AVAILABILITIES : The system is available for research purposes on: http://eagl.unige.ch/SNOCat

    EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts

    Get PDF
    BACKGROUND: A better understanding of the mechanisms of an enzyme's functionality and stability, as well as knowledge and impact of mutations is crucial for researchers working with enzymes. Though, several of the enzymes' databases are currently available, scientific literature still remains at large for up-to-date source of learning the effects of a mutation on an enzyme. However, going through vast amounts of scientific documents to extract the information on desired mutation has always been a time consuming process. In this paper, therefore, we describe an unique method, termed as EnzyMiner, which automatically identifies the PubMed abstracts that contain information on the impact of a protein level mutation on the stability and/or the activity of a given enzyme. RESULTS: We present an automated system which identifies the abstracts that contain an amino-acid-level mutation and then classifies them according to the mutation's effect on the enzyme. In the case of mutation identification, MuGeX, an automated mutation-gene extraction system has an accuracy of 93.1% with a 91.5 F-measure. For impact analysis, document classification is performed to identify the abstracts that contain a change in enzyme's stability or activity resulting from the mutation. The system was trained on lipases and tested on amylases with an accuracy of 85%. CONCLUSION: EnzyMiner identifies the abstracts that contain a protein mutation for a given enzyme and checks whether the abstract is related to a disease with the help of information extraction and machine learning techniques. For disease related abstracts, the mutation list and direct links to the abstracts are retrieved from the system and displayed on the Web. For those abstracts that are related to non-diseases, in addition to having the mutation list, the abstracts are also categorized into two groups. These two groups determine whether the mutation has an effect on the enzyme's stability or functionality followed by displaying these on the web

    Creating 'good' self-managers?: Facilitating and governing an online self care skills training course

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In chronic disease management, patients are increasingly called upon to undertake a new role as lay tutors within self-management training programmes. The internet constitutes an increasingly significant healthcare setting and a key arena for self-management support and communication. This study evaluates how a new quasi-professional health workforce – volunteer tutors – engage, guide and attempt to manage people with long-term conditions in the ways of 'good' self-management within the context of an online self-management course.</p> <p>Methods</p> <p>A qualitative analysis of postings to the discussion centre of 11 online classes (purposively selected from 27) run as part of the Expert Patients Programme. Facilitators (term for tutors online) and participants posted questions, comments and solutions related to self-management of long-term conditions; these were subjected to a textual and discursive analysis to explore:</p> <p>a) how facilitators, through the internet, engaged participants in issues related to self-management;</p> <p>b) how participants responded to and interacted with facilitators.</p> <p>Results</p> <p>Emergent themes included: techniques and mechanisms used to engage people with self-management; the process facilitators followed – 'sharing', 'modelling' and 'confirming'; and the emergence of a policing role regarding online disclosure. Whilst exchanging medical advice was discouraged, facilitators often professed to understand and give advice on psychological aspects of behaviour.</p> <p>Conclusion</p> <p>The study gave an insight into the roles tutors adopt – one being their ability to 'police' subjective management of long-term conditions and another being to attempt to enhance the psychological capabilities of participants.</p
    • …
    corecore