Search CORE

5 research outputs found

A Classifier to Evaluate Language Specificity in Medical Documents

Author: Chatterjee Samir
Fan Jie
Leroy Gondy A.
Miller Trudi, \u2708
Thoms Brian, \u2709
Publication venue: Scholarship @ Claremont
Publication date: 01/01/2007
Field of study

Consumer health information written by health care professionals is often inaccessible to the consumers it is written for. Traditional readability formulas examine syntactic features like sentence length and number of syllables, ignoring the target audience\u27s grasp of the words themselves. The use of specialized vocabulary disrupts the understanding of patients with low reading skills, causing a decrease in comprehension. A naive Bayes classifier for three levels of increasing medical terminology specificity (consumer/patient, novice health learner, medical professional) was created with a lexicon generated from a representative medical corpus. Ninety-six percent accuracy in classification was attained. The classifier was then applied to existing consumer health web pages. We found that only 4% of pages were classified at a layperson level, regardless of the Flesch reading ease scores, while the remaining pages were at the level of medical professionals. This indicates that consumer health web pages are not using appropriate language for their target audience

Scholarship@Claremont

SVMAUD: Using textual information to predict the audience level of written works using support vector machines

Author: Will Todd
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2014
Field of study

Information retrieval systems should seek to match resources with the reading ability of the individual user; similarly, an author must choose vocabulary and sentence structures appropriate for his or her audience. Traditional readability formulas, including the popular Flesch-Kincaid Reading Age and the Dale-Chall Reading Ease Score, rely on numerical representations of text characteristics, including syllable counts and sentence lengths, to suggest audience level of resources. However, the author’s chosen vocabulary, sentence structure, and even the page formatting can alter the predicted audience level by several levels, especially in the case of digital library resources. For these reasons, the performance of readability formulas when predicting the audience level of digital library resources is very low. Rather than relying on these inputs, machine learning methods, including cosine, Naïve Bayes, and Support Vector Machines (SVM), can suggest the grade level of an essay based on the vocabulary chosen by the author. The audience level prediction and essay grading problems share the same inputs, expert-labeled documents, and outputs, a numerical score representing quality or audience level. After a human expert labels a representative sample of resources with audience level, the proposed SVM-based audience level prediction program, SVMAUD, constructs a vocabulary for each audience level; then, the text in an unlabeled resource is compared with this predefined vocabulary to suggest the most appropriate audience level. Two readability formulas and four machine learning programs are evaluated with respect to predicting human-expert entered audience levels based on the text contained in an unlabeled resource. In a collection containing 10,238 expert-labeled HTML-based digital library resources, the Flesch-Kincaid Reading Age and the Dale-Chall Reading Ease Score predict the specific audience level with F-measures of 0.10 and 0.05, respectively. Conversely, cosine, Naïve Bayes, the Collins-Thompson and Callan model, and SVMAUD improve these F-measures to 0.57, 0.61, 0.68, and 0.78, respectively. When a term’s weight is adjusted based on the HTML tag in which it occurs, the specific audience level prediction performance of cosine, Naïve Bayes, the Collins-Thompson and Callan method, and SVMAUD improves to 0.68, 0.70, 0.75, and 0.84, respectively. When title, keyword, and abstract metadata is used for training, cosine, Naïve Bayes, the Collins-Thompson and Callan model, and SVMAUD specific audience level prediction F-measures are found to be 0.61, 0.68, 0.75, and 0.86, respectively. When cosine, Naïve Bayes, the Collins-Thompson and Callan method, and SVMAUD are trained and tested using resources from a single subject category, the specific audience level prediction F- measure performance improves to 0.63, 0.70, 0.77, and 0.87, respectively. SVMAUD experiences the highest audience level prediction performance among all methods under evaluation in this study. After SVMAUD is properly trained, it can be used to predict the audience level of any written work

Digital Commons @ New Jersey Institute of Technology (NJIT)

Personalising patient Internet searching using electronic patient records

Author: Al-Busaidi Asma Ali S.
Publication venue
Publication date
Field of study

The research reported in this thesis addresses a patient's information requirements when searching the Internet for health information. A patient's lack of information about his/her health condition and its care is officially acknowledged and traditional patient information sources do not address today's patient information needs. Internet health information resources have become the foremost health information platform. However, patient Internet searching is currently manual, uncustomised and hindered by health information vocabulary and quality challenges. Patient access to quality Internet health information is currently ensured through national health gateways, medical search engines, third-party accredited search engines and charity health websites. However, such resources are generic, i.e. do not cater for a patient particular information needs. In this study, we propose personalising patient Internet searching by enabling a patient's access to their Electronic Patient Records (EPRs) and using this EPR data in Internet information searching. The feasibility of patient access to EPRs has recently been promoted by national health information programmes. Very recently, in the literature, there are reports about pilot studies on personal Health Record (PHR) systems that offer a patient online access to their medical records and related health information. However, the extensive literature searching shows no reports about patient-personalised search engines, within the reported PHR prototypes, that utilise a patient's own data to personalise the search features for a patient especially with regard to health information vocabulary needs. The thesis presents a novel approach to personalising patient information searching based on linking EPR data with relevant Internet Information resources, integrating medical and lay perspectives in a diagnosis vocabulary that distinguishes between medical and lay information needs, and accommodating a variable perspective on online information quality. To demonstrate our research work, we have implemented a prototype online patient personal health information system, known as the Patient Health Base (PHB) that offers a patient a Summary Medical Record (SMR) and a Personal Internet Search (PerlS) service. PerlS addresses patient Internet search challenges identified in the project. Evaluation of PerlS's approach to improving a patient's medical Internet searching demonstrated improvements in terms of search capabilities, focusing techniques and results. This research explored a new direction for patient Internet searching and foresees a great potential for further customising Internet information searching for patients, families and the public as a whole

Online Research @ Cardiff

A Classifier to Evaluate Language Specificity of Medical Documents

Author: Brian Thoms
Gondy Leroy
Jie Fan
Samir Chatterjee
Trudi Miller
Publication venue
Publication date
Field of study

Consumer health information written by health care professionals is often inaccessible to the consumers it is written for. Traditional readability formulas examine syntactic features like sentence length and number of syllables, ignoring the target audience’s grasp of the words themselves. The use of specialized vocabulary disrupts the understanding of patients with low reading skills, causing a decrease in comprehension. A naïve Bayes classifier for three levels of increasing medical terminology specificity (consumer/patient, novice health learner, medical professional) was created with a lexicon generated from a representative medical corpus. Ninety-six percent accuracy in classification was attained. The classifier was then applied to existing consumer health web pages. We found that only 4 % of pages were classified at a layperson level, regardless of the Flesch reading ease scores, while the remaining pages were at the level of medical professionals. This indicates that consumer health web pages are not using appropriate language for their target audience

CiteSeerX