233 research outputs found
Approaches to text mining for clinical medical records
The 21st Annual ACM Symposium on Applied Computing 2006, Technical tracks on Computer Applications in Health Care (CAHC 2006), Dijon, France, April 23 -27, 2006. Retrieved 6/21/2006 from http://www.ischool.drexel.edu/faculty/hhan/SAC2006_CAHC.pdf.Clinical medical records contain a wealth of information, largely in free-text form. Means to extract structured information from free-text records is an important research endeavor. In this paper, we describe a MEDical Information Extraction (MedIE) system that extracts and mines a variety of patient information with breast complaints from free-text clinical records. MedIE is a part of medical text mining project being conducted in Drexel University. Three approaches are proposed to solve different IE tasks and very good performance (precision and recall) was achieved. A graph-based approach which uses the parsing result of link-grammar parser was invented for relation extraction; high accuracy was achieved. A simple but efficient ontology-based approach was adopted to extract medical terms of interest. Finally, an NLP-based feature extraction method coupled with an ID3-based decision tree was used to perform text classification
Recommended from our members
Artificial Immune Systems - Models, algorithms and applications
Copyright Š 2010 Academic Research Publishing Agency.This article has been made available through the Brunel Open Access Publishing Fund.Artificial Immune Systems (AIS) are computational paradigms that belong to the computational intelligence family and are inspired by the biological immune system. During the past decade, they have attracted a lot of interest from researchers aiming to develop immune-based models and techniques to solve complex computational or engineering problems. This work presents a survey of existing AIS models and algorithms with a focus on the last five years.This article is available through the Brunel Open Access Publishing Fun
DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation
For the first participation of Dublin City University (DCU)
in the FIRE 2010 evaluation campaign, information retrieval
(IR) experiments on English, Bengali, Hindi, and Marathi
documents were performed to investigate term conation
(different stemming approaches and indexing word prefixes),
blind relevance feedback, and manual and automatic query
translation. The experiments are based on BM25 and on
language modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP)
compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi,
the corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAP
than BM25 (0.4944 vs. 0.4526). In all experiments using
BM25, blind relevance feedback yields considerably higher
MAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) are
based on query translations obtained from native speakers
and the Google translate web service. For the automatically
translated queries, MAP is slightly (but not significantly)
lower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi)
experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best
corresponding monolingual experiments
Towards better measures: evaluation of estimated resource description quality for distributed IR
An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work
DCU@FIRE-2012: rule-based stemmers for Bengali and Hindi
For the participation of Dublin City University (DCU) in the FIRE-2012 Morpheme Extraction Task (MET), we investigated a rule based stemming approaches for Bengali and Hindi IR. The MET task itself is an attempt to obtain a fair and direct comparison between various stemming approaches measured by comparing the retrieval effectiveness obtained by each on the same dataset. Linguistic knowledge was used to manually craft the rules for removing the commonly occurring plural suffixes for Hindi and Bengali. Additionally, rules for removing classifiers and case markers in Bengali were also formulated. Our rule-based stemming approaches produced the best and the second-best retrieval effectiveness for Hindi and Bengali datasets respectively
Recommended from our members
An empirical study of evolution of inheritance in Java OSS
Previous studies of Object-Oriented (OO) software have reported avoidance of the inheritance mechanism and cast doubt on the wisdom of âdeepâ inheritance levels. From an evolutionary perspective, the picture is unclear - we still know relatively little about how, over time, changes tend to be applied by developers. Our conjecture is that an inheritance hierarchy will tend to grow âbreadth-wiseâ rather than âdepth-wiseâ. This claim is made on the basis that developers will avoid extending depth in favour of breadth because of the inherent complexity of having to understand the functionality of superclasses. Thus the goal of our study is to investigate this empirically. We conduct an empirical study of seven Java Open-Source Systems (OSSs) over a series of releases to observe the nature and location of changes within the inheritance hierarchies. Results show a strong tendency for classes to be added at levels one and two of the hierarchy (rather than anywhere else). Over 96% of classes added over the course of the versions of all systems were at level 1 or level 2. The results suggest that changes cluster in the shallow levels of a hierarchy; this is relevant for developers since it indicates where remedial activities such as refactoring should be focused
Mobile Learning Content Authoring Tools (MLCATs): A Systematic Review
Mobile learning is currently receiving a lot of attention within the education arena, particularly within electronic learning. This is attributed to the increasing mobile penetration rates and the subsequent increases in university student enrolments. Mobile Learning environments are supported by a number of crucial services such as content creation which require an authoring tool. The last decade or so has witnessed increased attention to tools for authoring mobile learning content for education. This can be seen from the vast number of conference and journal publications devoted to the topic. Therefore, the goal of this paper is to review works that were published, suggest a new classification framework and explore each of the classification features. This paper is based on a systematic review of mobile learning content authoring tools (MLCATs) from 2000 to 2009. The framework is developed based on a number of dimensions such as system type, development context, Tools and Technologies used, tool availability, ICTD relation, support for standards, learning style support, media supported and tool purpose. This paper provides a means for researchers to extract assertions and several important lessons for the choice and implementation of MLCATs
Analysing the Security of Google's implementation of OpenID Connect
Many millions of users routinely use their Google accounts to log in to
relying party (RP) websites supporting the Google OpenID Connect service.
OpenID Connect, a newly standardised single-sign-on protocol, builds an
identity layer on top of the OAuth 2.0 protocol, which has itself been widely
adopted to support identity management services. It adds identity management
functionality to the OAuth 2.0 system and allows an RP to obtain assurances
regarding the authenticity of an end user. A number of authors have analysed
the security of the OAuth 2.0 protocol, but whether OpenID Connect is secure in
practice remains an open question. We report on a large-scale practical study
of Google's implementation of OpenID Connect, involving forensic examination of
103 RP websites which support its use for sign-in. Our study reveals serious
vulnerabilities of a number of types, all of which allow an attacker to log in
to an RP website as a victim user. Further examination suggests that these
vulnerabilities are caused by a combination of Google's design of its OpenID
Connect service and RP developers making design decisions which sacrifice
security for simplicity of implementation. We also give practical
recommendations for both RPs and OPs to help improve the security of real world
OpenID Connect systems
- âŚ