233 research outputs found

    Approaches to text mining for clinical medical records

    Get PDF
    The 21st Annual ACM Symposium on Applied Computing 2006, Technical tracks on Computer Applications in Health Care (CAHC 2006), Dijon, France, April 23 -27, 2006. Retrieved 6/21/2006 from http://www.ischool.drexel.edu/faculty/hhan/SAC2006_CAHC.pdf.Clinical medical records contain a wealth of information, largely in free-text form. Means to extract structured information from free-text records is an important research endeavor. In this paper, we describe a MEDical Information Extraction (MedIE) system that extracts and mines a variety of patient information with breast complaints from free-text clinical records. MedIE is a part of medical text mining project being conducted in Drexel University. Three approaches are proposed to solve different IE tasks and very good performance (precision and recall) was achieved. A graph-based approach which uses the parsing result of link-grammar parser was invented for relation extraction; high accuracy was achieved. A simple but efficient ontology-based approach was adopted to extract medical terms of interest. Finally, an NLP-based feature extraction method coupled with an ID3-based decision tree was used to perform text classification

    DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation

    Get PDF
    For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, information retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conation (different stemming approaches and indexing word prefixes), blind relevance feedback, and manual and automatic query translation. The experiments are based on BM25 and on language modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP) compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi, the corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAP than BM25 (0.4944 vs. 0.4526). In all experiments using BM25, blind relevance feedback yields considerably higher MAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) are based on query translations obtained from native speakers and the Google translate web service. For the automatically translated queries, MAP is slightly (but not significantly) lower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi) experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best corresponding monolingual experiments

    Towards better measures: evaluation of estimated resource description quality for distributed IR

    Get PDF
    An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work

    DCU@FIRE-2012: rule-based stemmers for Bengali and Hindi

    Get PDF
    For the participation of Dublin City University (DCU) in the FIRE-2012 Morpheme Extraction Task (MET), we investigated a rule based stemming approaches for Bengali and Hindi IR. The MET task itself is an attempt to obtain a fair and direct comparison between various stemming approaches measured by comparing the retrieval effectiveness obtained by each on the same dataset. Linguistic knowledge was used to manually craft the rules for removing the commonly occurring plural suffixes for Hindi and Bengali. Additionally, rules for removing classifiers and case markers in Bengali were also formulated. Our rule-based stemming approaches produced the best and the second-best retrieval effectiveness for Hindi and Bengali datasets respectively

    Mobile Learning Content Authoring Tools (MLCATs): A Systematic Review

    Get PDF
    Mobile learning is currently receiving a lot of attention within the education arena, particularly within electronic learning. This is attributed to the increasing mobile penetration rates and the subsequent increases in university student enrolments. Mobile Learning environments are supported by a number of crucial services such as content creation which require an authoring tool. The last decade or so has witnessed increased attention to tools for authoring mobile learning content for education. This can be seen from the vast number of conference and journal publications devoted to the topic. Therefore, the goal of this paper is to review works that were published, suggest a new classification framework and explore each of the classification features. This paper is based on a systematic review of mobile learning content authoring tools (MLCATs) from 2000 to 2009. The framework is developed based on a number of dimensions such as system type, development context, Tools and Technologies used, tool availability, ICTD relation, support for standards, learning style support, media supported and tool purpose. This paper provides a means for researchers to extract assertions and several important lessons for the choice and implementation of MLCATs

    Analysing the Security of Google's implementation of OpenID Connect

    Get PDF
    Many millions of users routinely use their Google accounts to log in to relying party (RP) websites supporting the Google OpenID Connect service. OpenID Connect, a newly standardised single-sign-on protocol, builds an identity layer on top of the OAuth 2.0 protocol, which has itself been widely adopted to support identity management services. It adds identity management functionality to the OAuth 2.0 system and allows an RP to obtain assurances regarding the authenticity of an end user. A number of authors have analysed the security of the OAuth 2.0 protocol, but whether OpenID Connect is secure in practice remains an open question. We report on a large-scale practical study of Google's implementation of OpenID Connect, involving forensic examination of 103 RP websites which support its use for sign-in. Our study reveals serious vulnerabilities of a number of types, all of which allow an attacker to log in to an RP website as a victim user. Further examination suggests that these vulnerabilities are caused by a combination of Google's design of its OpenID Connect service and RP developers making design decisions which sacrifice security for simplicity of implementation. We also give practical recommendations for both RPs and OPs to help improve the security of real world OpenID Connect systems
    • …
    corecore