6 research outputs found
Automated Information Extraction to Support Biomedical Decision Model Construction: A Preliminary Design
We propose an information extraction framework to support automated construction of decision models in biomedicine. Our proposed technique classifies text-based documents from a large biomedical literature repository, e.g., MEDLINE, into predefined categories, and identifies important keywords for each category based on their discriminative power. Relevant documents for each category are retrieved based on the keywords, and a classification algorithm is developed based on machine learning techniques to build the final classifier. We apply the HITS algorithm to select the authoritative and typical documents within a category, and construct templates in the form of Bayesian networks. Data mining and information extraction techniques are then applied to extract the necessary semantic knowledge to fill in the templates to construct the final decision models.Singapore-MIT Alliance (SMA
Doctor of Philosophy
dissertationDisease-specific ontologies, designed to structure and represent the medical knowledge about disease etiology, diagnosis, treatment, and prognosis, are essential for many advanced applications, such as predictive modeling, cohort identification, and clinical decision support. However, manually building disease-specific ontologies is very labor-intensive, especially in the process of knowledge acquisition. On the other hand, medical knowledge has been documented in a variety of biomedical knowledge resources, such as textbook, clinical guidelines, research articles, and clinical data repositories, which offers a great opportunity for an automated knowledge acquisition. In this dissertation, we aim to facilitate the large-scale development of disease-specific ontologies through automated extraction of disease-specific vocabularies from existing biomedical knowledge resources. Three separate studies presented in this dissertation explored both manual and automated vocabulary extraction. The first study addresses the question of whether disease-specific reference vocabularies derived from manual concept acquisition can achieve a near-saturated coverage (or near the greatest possible amount of disease-pertinent concepts) by using a small number of literature sources. Using a general-purpose, manual acquisition approach we developed, this study concludes that a small number of expert-curated biomedical literature resources can prove sufficient for acquiring near-saturated disease-specific vocabularies. The second and third studies introduce automated techniques for extracting disease-specific vocabularies from both MEDLINE citations (title and abstract) and a clinical data repository. In the second study, we developed and assessed a pipeline-based system which extracts disease-specific treatments from PubMed citations. The system has achieved a mean precision of 0.8 for the top 100 extracted treatment concepts. In the third study, we applied classification models to reduce irrelevant disease-concepts associations extracted from MEDLINE citations and electronic medical records. This study suggested the combination of measures of relevance from disparate sources to improve the identification of true-relevant concepts through classification and also demonstrated the generalizability of the studied classification model to new diseases. With the studies, we concluded that existing biomedical knowledge resources are valuable sources for extracting disease-concept associations, from which classification based on statistical measures of relevance could assist a semi-automated generation of disease-specific vocabularies
Recommended from our members
Analysis of Search on Clinical Narrative within the EHR
Electronic Health Records (EHRs) are used increasingly in the hospital and outpatient set- tings, and patients are amassing digitized clinical information. On one hand, aggregating all the patient's clinical information can greatly assist health care workers in making sound decisions. On the other hand, it can result in information overload, making it difficult to browse for information within the health record. Considering the time constraints clinicians face, one way to reduce information overload is through a search utility. However, traditional, free-text search engines within the EHR can potentially miss documents that do not contain the query but that are relevant to the clinical user's search. This dissertation aims at addressing this gap by analyzing within-patient search of the EHR and examining various semantic search approaches on clinical narrative. Our work consists of three studies where clinical users' search needs are examined, traditional string-matching is analyzed, and semantic search approaches on clinical narrative are evaluated. The first study applied a mixed method approach in order to provide a better understanding of clinical users' search needs within the EHR. It is comprised of a retrospective log analysis of search log files and a survey that was administered to clinical professionals within our institution. The log analysis attempts to categorize how users of a search system query for information, and the survey tries to understand users' search preferences. This study showed that clinical users were very interested in search functionality within the EHR and that various types of users utilize a search utility differently. Overall, most users searched for specific laboratory tests and diseases within the health record. The last two studies rely on a gold standard, which was developed specifically for this dissertation. The gold standard contained a document collection, a set of queries, and for each document/query pair, a relevance judgment. This gold standard was used to evaluate and compare different search models on clinical narrative. The second study conducted was an error analysis of the traditional, vector-space model search approach. The study examined the false positives and false negatives of this approach and categorized the errors in order to identify gaps that semantic approaches may fill. The last study was a systematic evaluation of five different semantic search approaches. These search methods consisted of distributional semantic approaches and an ontology-based approach. The study identified that a mixed topic modeling and vector-space model approach was the best performing search algorithm on our gold standard. All of these studies lay the foundation for us to gain a deeper understanding of information retrieval methods within the electronic health record. Ultimately, this will allow health care professionals to easily access pertinent patient information, which could result in better health care delivery
Methods in literature-based drug discovery
This dissertation work implemented two literature-based methods for predicting new therapeutic uses for drugs, or drug reprofiling (also known as drug repositioning or drug repurposing). Both methods used data stored in ChemoText, a repository of MeSH terms extracted from Medline records and created and designed to support drug discovery algorithms. The first method was an implementation of Swanson's ABC paradigm that used explicit connections between disease, protein, and chemical annotations to find implicit connections between drugs and disease that could be potential new therapeutic drug treatments. The validation approach implemented in the ABC study divided the corpus into two segments based on a year cutoff. The data in the earlier or baseline period was used to create the hypotheses, and the later period data was used to validate the hypotheses. Ranking approaches were used to put the likeliest drug reprofiling candidates near the top of the hypothesis set. The approaches were successful at reproducing Swanson's link between magnesium and migraine and at identifying other significant reprofiled drugs. The second literature-based discovery method used the patterns in side effect annotations to predict drug molecular activity, specifically 5-HT6 binding and dopamine antagonism. Following a study design adopted from QSAR experiments, side effect information for chemicals with known activity was input as binary vectors into classification algorithms. Models were trained on this data to predict the molecular activity. When the best validated models were applied to a large set of chemicals in a virtual screening step, they successfully identified known 5-HT6 binders and dopamine antagonists based solely on side effect profiles. Both studies addressed research areas relevant to current drug discovery, and both studies incorporated rigorous validation steps. For these reasons, the text mining methods presented here, in addition to the ChemoText repository, have the potential to be adopted in the computational drug discovery laboratory and integrated into existing toolsets
Front-Line Physicians' Satisfaction with Information Systems in Hospitals
Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe