58 research outputs found

    Source authenticity in the UMLS – A case study of the Minimal Standard Terminology

    Get PDF
    AbstractAs the UMLS integrates multiple source vocabularies, the integration process requires that certain adaptation be applied to the source. Our interest is in examining the relationship between the UMLS representation of a source vocabulary and the source vocabulary itself. We investigated the integration of the Minimal Standard Terminology (MST) into the UMLS in order to examine how close its UMLS representation is to the source MST. The MST was conceived as a “minimal” list of terms and structure intended for use within computer systems to facilitate standardized reporting of gastrointestinal endoscopic examinations. Although the MST has an overall schema and implied relationship structure, many of the UMLS integrated MST terms were found to be hierarchically orphaned, and with lateral relationships that do not closely adhere to the source MST. Thus, the MST representation within the UMLS significantly differs from that of the source MST. These representation discrepancies may affect the usability of the MST representation in the UMLS for knowledge acquisition. Furthermore, they pose a problem from the perspective of application developers. While these findings may not necessarily apply to other source terminologies, they highlight the conflict between preservation of authentic concept orientation and the UMLS overall desire to provide fully specified names for all source terms

    Structural indicators for effective quality assurance of snomed ct

    Get PDF
    The Standardized Nomenclature of Medicine -- Clinical Terms (SNOMED CT -- further abbreviated as SCT) has been endorsed as a premier clinical terminology by many national and international organizations. The US Government has chosen SCT to play a significant role in its initiative to promote Electronic Health Record (EH R) country-wide. However, there is evidence suggesting that, at the moment, SCT is not optimally modeled for its intended use by healthcare practitioners. There is a need to perform quality assurance (QA) of SCT to help expedite its use as a reference terminology for clinical purposes as planned for EH R use. The central theme of this dissertation is to define a group-based auditing methodology to effectively identify concepts of SCT that require QA. As such, similarity sets are introduced which are groups of concepts that are lexically identical except for one word. Concepts in a similarity set are expected to be modeled in a consistent way. If not, the set is considered to be inconsistent and submitted for review by an auditor. Initial studies found 38% of such sets to be inconsistent. The effectiveness of these sets is further improved through the use of three structural indicators. Using such indicators as the number of parents, relationships and role groups, up to 70% of the similarity sets and 32.6% of the concepts are found to exhibit inconsistencies. Furthermore, positional similarity sets, which are similarity sets with the same position of the differing word in the concept’s terms, are introduced to improve the likelihood of finding errors at the concept level. This strictness in the position of the differing word increases the lexical similarity between the concepts of a set thereby increasing the contrast between lexical similarities and modeling differences. This increase in contrast increases the likelihood of finding inconsistencies. The effectiveness of positional similarity sets in finding inconsistencies is further improved by using the same three structural indicators as discussed above in the generation of these sets. An analysis of 50 sample sets with differences in the number of relationships reveal 41.6% of the concepts to be inconsistent. Moreover, a study is performed to fully automate the process of suggesting attributes to enhance the modeling of SCT concepts using positional similarity sets. A technique is also used to automatically suggest the corresponding target values. An analysis of 50 sample concepts show that, of the 103 suggested attributes, 67 are manually confirmed to be correct. Finally, a study is conducted to examine the readiness of SCT problem list (PL) to support meaningful use of EHR. The results show that the concepts in PL suffer from the same issues as general SCT concepts, although to a slightly lesser extent, and do require further QA efforts. To support such efforts, structural indicators in the form of the number of parents and the number of words are shown to be effective in ferreting out potentially problematic concepts in which QA efforts should be focused. A structural indicator to find concepts with synonymy problems is also presented by finding pairs of SCT concepts that map to the same UMLS concept

    Medical Informatics

    Get PDF
    Information technology has been revolutionizing the everyday life of the common man, while medical science has been making rapid strides in understanding disease mechanisms, developing diagnostic techniques and effecting successful treatment regimen, even for those cases which would have been classified as a poor prognosis a decade earlier. The confluence of information technology and biomedicine has brought into its ambit additional dimensions of computerized databases for patient conditions, revolutionizing the way health care and patient information is recorded, processed, interpreted and utilized for improving the quality of life. This book consists of seven chapters dealing with the three primary issues of medical information acquisition from a patient's and health care professional's perspective, translational approaches from a researcher's point of view, and finally the application potential as required by the clinicians/physician. The book covers modern issues in Information Technology, Bioinformatics Methods and Clinical Applications. The chapters describe the basic process of acquisition of information in a health system, recent technological developments in biomedicine and the realistic evaluation of medical informatics

    Longitudinal Patient Records: A Re-Examination of the Possibility

    Get PDF
    It has long been recognized that the Longitudinal Patient Record (LPR) has been defined as “A life-long incremental process where each clinical encounter is merely an updating of the file” (Gabrieli, 1997) Understanding the health condition of patient longitudinally is very important to the care of the patient. However, it is not clear to what extent a longitudinal patient record is in fact possible, since a true longitudinal patient record would need to include all information for a patient, from cradle to grave, across all healthcare providers and systems, across all corporate or geographic or national boundaries. Compiling or maintaining such a record is a problem of staggering practical difficulties. Yet, there is no doubt of the potential benefit to the patient of the availability of such a record to the patient’s caregivers and providers. In this thesis, we re-examine the possibility of a longitudinal patient record, both in its pure logical sense, and in a practical sense. One point of view that we stress is to model the longitudinal patient record not so much as a static thing, but rather as a functional entity. That is, the longitudinal patient record is understood as a set of processes that provide the physician or other clinician decision maker (or for that matter the patient himself) with whatever longitudinal view of the patient information is available and practical to serve the current context of decision making. That is, the model we suggest is one of making the most out of whatever patient information is available to the decision maker

    Comparison of automated literature based gene-disease association using gene set enrichment analysis

    Full text link
    Cancer is a leading cause of death in Australia: more than 43,000 people have been estimated to have died from cancer in 2010. However, the genetic causes of cancer remain elusive despite voluminous genetic data in the public domain. Our goal is to identify genes in order to understand the molecular mechanisms of cancer so that diagnosis, prognosis and treatment can be optimized. Microarrays measure gene expression levels in disease tissue relative to normal tissue. However, microarray data are noisy and computational methods are required to associate aberrant gene expression with disease. Subramanian et al. (2005) developed an approach called Gene Set Enrichment Analysis (GSEA) that annotates microarray data with functional terms from a background ontology. The enriched gene sets have shown to improve the quality of microarray annotation compared to single gene annotation. Nevertheless, GSEA falls short when used to predict disease-gene associations. We hypothesized that GSEA’s shortfall is caused by limited knowledge embedded in its ontology. Thus we have proposed a novel method, which automatically constructs ontologies for use in GSEA directly from the biomedical literature and then associates genes with diseases. This thesis tests this hypothesis. My results show that using knowledge derived automatically from biomedical literature outperforms GSEA’s default catalogues and achieves high area under the receiver operating characteristic curve (AUC) scores when tested on breast and colorectal cancer samples. The results indicate that the automated literature-based approach is a promising method for discovering novel gene-disease associations. In conclusion, I have shown that literature-based generated catalogues are accurate and viable for prediction of gene-disease associations

    Building standardized and secure mobile health services based on social media

    Get PDF
    Mobile devices and social media have been used to create empowering healthcare services. However, privacy and security concerns remain. Furthermore, the integration of interoperability biomedical standards is a strategic feature. Thus, the objective of this paper is to build enhanced healthcare services by merging all these components. Methodologically, the current mobile health telemonitoring architectures and their limitations are described, leading to the identification of new potentialities for a novel architecture. As a result, a standardized, secure/private, social-media-based mobile health architecture has been proposed and discussed. Additionally, a technical proof-of-concept (two Android applications) has been developed by selecting a social media (Twitter), a security envelope (open Pretty Good Privacy (openPGP)), a standard (Health Level 7 (HL7)) and an information-embedding algorithm (modifying the transparency channel, with two versions). The tests performed included a small-scale and a boundary scenario. For the former, two sizes of images were tested; for the latter, the two versions of the embedding algorithm were tested. The results show that the system is fast enough (less than 1 s) for most mHealth telemonitoring services. The architecture provides users with friendly (images shared via social media), straightforward (fast and inexpensive), secure/private and interoperable mHealth services

    Health systems data interoperability and implementation

    Get PDF
    Objective The objective of this study was to use machine learning and health standards to address the problem of clinical data interoperability across healthcare institutions. Addressing this problem has the potential to make clinical data comparable, searchable and exchangeable between healthcare providers. Data sources Structured and unstructured data has been used to conduct the experiments in this study. The data was collected from two disparate data sources namely MIMIC-III and NHanes. The MIMIC-III database stored data from two electronic health record systems which are CareVue and MetaVision. The data stored in these systems was not recorded with the same standards; therefore, it was not comparable because some values were conflicting, while one system would store an abbreviation of a clinical concept, the other would store the full concept name and some of the attributes contained missing information. These few issues that have been identified make this form of data a good candidate for this study. From the identified data sources, laboratory, physical examination, vital signs, and behavioural data were used for this study. Methods This research employed a CRISP-DM framework as a guideline for all the stages of data mining. Two sets of classification experiments were conducted, one for the classification of structured data, and the other for unstructured data. For the first experiment, Edit distance, TFIDF and JaroWinkler were used to calculate the similarity weights between two datasets, one coded with the LOINC terminology standard and another not coded. Similar sets of data were classified as matches while dissimilar sets were classified as non-matching. Then soundex indexing method was used to reduce the number of potential comparisons. Thereafter, three classification algorithms were trained and tested, and the performance of each was evaluated through the ROC curve. Alternatively the second experiment was aimed at extracting patient’s smoking status information from a clinical corpus. A sequence-oriented classification algorithm called CRF was used for learning related concepts from the given clinical corpus. Hence, word embedding, random indexing, and word shape features were used for understanding the meaning in the corpus. Results Having optimized all the model’s parameters through the v-fold cross validation on a sampled training set of structured data ( ), out of 24 features, only ( 8) were selected for a classification task. RapidMiner was used to train and test all the classification algorithms. On the final run of classification process, the last contenders were SVM and the decision tree classifier. SVM yielded an accuracy of 92.5% when the and parameters were set to and . These results were obtained after more relevant features were identified, having observed that the classifiers were biased on the initial data. On the other side, unstructured data was annotated via the UIMA Ruta scripting language, then trained through the CRFSuite which comes with the CLAMP toolkit. The CRF classifier obtained an F-measure of 94.8% for “nonsmoker” class, 83.0% for “currentsmoker”, and 65.7% for “pastsmoker”. It was observed that as more relevant data was added, the performance of the classifier improved. The results show that there is a need for the use of FHIR resources for exchanging clinical data between healthcare institutions. FHIR is free, it uses: profiles to extend coding standards; RESTFul API to exchange messages; and JSON, XML and turtle for representing messages. Data could be stored as JSON format on a NoSQL database such as CouchDB, which makes it available for further post extraction exploration. Conclusion This study has provided a method for learning a clinical coding standard by a computer algorithm, then applying that learned standard to unstandardized data so that unstandardized data could be easily exchangeable, comparable and searchable and ultimately achieve data interoperability. Even though this study was applied on a limited scale, in future, the study would explore the standardization of patient’s long-lived data from multiple sources using the SHARPn open-sourced tools and data scaling platformsInformation ScienceM. Sc. (Computing

    Electronic Medical Records and E-Discovery: With New Technology Come New Challenges

    Get PDF
    Electronic medical records have created new challenges for lawyers because all of the digitized information is printed on reams of paper during discovery. This makes the record both voluminous and difficult to interpret. This Note examines potential solutions that would allow lawyers to view electronic medical records in a digital format while preserving patient privacy. Two solutions are explored: 1) accessing the electronic medical record remotely by adapting tools that are already in place for doctors to remotely access patient records and 2) detailing a method to export an electronic medical record to a common, interoperable format

    Word-sense disambiguation in biomedical ontologies

    Get PDF
    With the ever increase in biomedical literature, text-mining has emerged as an important technology to support bio-curation and search. Word sense disambiguation (WSD), the correct identification of terms in text in the light of ambiguity, is an important problem in text-mining. Since the late 1940s many approaches based on supervised (decision trees, naive Bayes, neural networks, support vector machines) and unsupervised machine learning (context-clustering, word-clustering, co-occurrence graphs) have been developed. Knowledge-based methods that make use of the WordNet computational lexicon have also been developed. But only few make use of ontologies, i.e. hierarchical controlled vocabularies, to solve the problem and none exploit inference over ontologies and the use of metadata from publications. This thesis addresses the WSD problem in biomedical ontologies by suggesting different approaches for word sense disambiguation that use ontologies and metadata. The "Closest Sense" method assumes that the ontology defines multiple senses of the term; it computes the shortest path of co-occurring terms in the document to one of these senses. The "Term Cooc" method defines a log-odds ratio for co-occurring terms including inferred co-occurrences. The "MetaData" approach trains a classifier on metadata; it does not require any ontology, but requires training data, which the other methods do not. These approaches are compared to each other when applied to a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The MetaData approach performs best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The Term Cooc approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The Closest Sense approach achieves on average 80% success rate. Furthermore, the thesis showcases applications ranging from ontology design to semantic search where WSD is important
    corecore