17 research outputs found
The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid
BACKGROUND: The Cancer Biomedical Informatics Grid (caBIG™) is a network of individuals and institutions, creating a world wide web of cancer research. An important aspect of this informatics effort is the development of consistent practices for data standards development, using a multi-tier approach that facilitates semantic interoperability of systems. The semantic tiers include (1) information models, (2) common data elements, and (3) controlled terminologies and ontologies. The College of American Pathologists (CAP) cancer protocols and checklists are an important reporting standard in pathology, for which no complete electronic data standard is currently available. METHODS: In this manuscript, we provide a case study of Cancer Common Ontologic Representation Environment (caCORE) data standard implementation of the CAP cancer protocols and checklists model – an existing and complex paper based standard. We illustrate the basic principles, goals and methodology for developing caBIG™ models. RESULTS: Using this example, we describe the process required to develop the model, the technologies and data standards on which the process and models are based, and the results of the modeling effort. We address difficulties we encountered and modifications to caCORE that will address these problems. In addition, we describe four ongoing development projects that will use the emerging CAP data standards to achieve integration of tissue banking and laboratory information systems. CONCLUSION: The CAP cancer checklists can be used as the basis for an electronic data standard in pathology using the caBIG™ semantic modeling methodology
caGrid-Enabled caBIGTM Silver Level Compatible Head and Neck Cancer Tissue Database System
There are huge amounts of biomedical data generated by research labs in each cancer institution. The data are stored in various formats and accessed through numerous interfaces. It is very difficult to exchange and integrate the data among different cancer institutions, even among different research labs within the same institution, in order to discover useful biomedical knowledge for the healthcare community. In this paper, we present the design and implementation of a caGrid-enabled caBIGTM silver level compatible head and neck cancer tissue database system. The system is implemented using a set of open source software and tools developed by the NCI, such as the caCORE SDK and caGrid. The head and neck cancer tissue database system has four interfaces: Web-based, Java API, XML utility, and Web service. The system has been shown to provide robust and programmatically accessible biomedical information services that syntactically and semantically interoperate with other resources
The development and deployment of Common Data Elements for tissue banks for translational research in cancer – An emerging standard based approach for the Mesothelioma Virtual Tissue Bank
<p>Abstract</p> <p>Background</p> <p>Recent advances in genomics, proteomics, and the increasing demands for biomarker validation studies have catalyzed changes in the landscape of cancer research, fueling the development of tissue banks for translational research. A result of this transformation is the need for sufficient quantities of clinically annotated and well-characterized biospecimens to support the growing needs of the cancer research community. Clinical annotation allows samples to be better matched to the research question at hand and ensures that experimental results are better understood and can be verified. To facilitate and standardize such annotation in bio-repositories, we have combined three accepted and complementary sets of data standards: the College of American Pathologists (CAP) Cancer Checklists, the protocols recommended by the Association of Directors of Anatomic and Surgical Pathology (ADASP) for pathology data, and the North American Association of Central Cancer Registry (NAACCR) elements for epidemiology, therapy and follow-up data. Combining these approaches creates a set of International Standards Organization (ISO) – compliant Common Data Elements (CDEs) for the mesothelioma tissue banking initiative supported by the National Institute for Occupational Safety and Health (NIOSH) of the Center for Disease Control and Prevention (CDC).</p> <p>Methods</p> <p>The purpose of the project is to develop a core set of data elements for annotating mesothelioma specimens, following standards established by the CAP checklist, ADASP cancer protocols, and the NAACCR elements. We have associated these elements with modeling architecture to enhance both syntactic and semantic interoperability. The system has a Java-based multi-tiered architecture based on Unified Modeling Language (UML).</p> <p>Results</p> <p>Common Data Elements were developed using controlled vocabulary, ontology and semantic modeling methodology. The CDEs for each case are of different types: demographic, epidemiologic data, clinical history, pathology data including block level annotation, and follow-up data including treatment, recurrence and vital status. The end result of such an effort would eventually provide an increased sample set to the researchers, and makes the system interoperable between institutions.</p> <p>Conclusion</p> <p>The CAP, ADASP and the NAACCR elements represent widely established data elements that are utilized in many cancer centers. Herein, we have shown these representations can be combined and formalized to create a core set of annotations for banked mesothelioma specimens. Because these data elements are collected as part of the normal workflow of a medical center, data sets developed on the basis of these elements can be easily implemented and maintained.</p
Development of the Lymphoma Enterprise Architecture Database: A caBIG(tm) Silver level compliant System
Lymphomas are the fifth most common cancer in United States with numerous histological subtypes. Integrating existing clinical information on lymphoma patients provides a platform for understanding biological variability in presentation and treatment response and aids development of novel therapies. We developed a cancer Biomedical Informatics Grid™ (caBIG™) Silver level compliant lymphoma database, called the Lymphoma Enterprise Architecture Data-system™ (LEAD™), which integrates the pathology, pharmacy, laboratory, cancer registry, clinical trials, and clinical data from institutional databases. We utilized the Cancer Common Ontological Representation Environment Software Development Kit (caCORE SDK) provided by National Cancer Institute’s Center for Bioinformatics to establish the LEAD™ platform for data management. The caCORE SDK generated system utilizes an n-tier architecture with open Application Programming Interfaces, controlled vocabularies, and registered metadata to achieve semantic integration across multiple cancer databases. We demonstrated that the data elements and structures within LEAD™ could be used to manage clinical research data from phase 1 clinical trials, cohort studies, and registry data from the Surveillance Epidemiology and End Results database. This work provides a clear example of how semantic technologies from caBIG™ can be applied to support a wide range of clinical and research tasks, and integrate data from disparate systems into a single architecture. This illustrates the central importance of caBIG™ to the management of clinical and biological data
BMC Cancer
BackgroundRecent advances in genomics, proteomics, and the increasing demands for biomarker validation studies have catalyzed changes in the landscape of cancer research, fueling the development of tissue banks for translational research. A result of this transformation is the need for sufficient quantities of clinically annotated and well-characterized biospecimens to support the growing needs of the cancer research community. Clinical annotation allows samples to be better matched to the research question at hand and ensures that experimental results are better understood and can be verified. To facilitate and standardize such annotation in bio-repositories, we have combined three accepted and complementary sets of data standards: the College of American Pathologists (CAP) Cancer Checklists, the protocols recommended by the Association of Directors of Anatomic and Surgical Pathology (ADASP) for pathology data, and the North American Association of Central Cancer Registry (NAACCR) elements for epidemiology, therapy and follow-up data. Combining these approaches creates a set of International Standards Organization (ISO) \ue2\u20ac\u201c compliant Common Data Elements (CDEs) for the mesothelioma tissue banking initiative supported by the National Institute for Occupational Safety and Health (NIOSH) of the Center for Disease Control and Prevention (CDC).MethodsThe purpose of the project is to develop a core set of data elements for annotating mesothelioma specimens, following standards established by the CAP checklist, ADASP cancer protocols, and the NAACCR elements. We have associated these elements with modeling architecture to enhance both syntactic and semantic interoperability. The system has a Java-based multi-tiered architecture based on Unified Modeling Language (UML).ResultsCommon Data Elements were developed using controlled vocabulary, ontology and semantic modeling methodology. The CDEs for each case are of different types: demographic, epidemiologic data, clinical history, pathology data including block level annotation, and follow-up data including treatment, recurrence and vital status. The end result of such an effort would eventually provide an increased sample set to the researchers, and makes the system interoperable between institutions.ConclusionThe CAP, ADASP and the NAACCR elements represent widely established data elements that are utilized in many cancer centers. Herein, we have shown these representations can be combined and formalized to create a core set of annotations for banked mesothelioma specimens. Because these data elements are collected as part of the normal workflow of a medical center, data sets developed on the basis of these elements can be easily implemented and maintained.1U19OH009077-01/OH/NIOSH CDC HHS/United StatesUL1 TR000005/TR/NCATS NIH HHS/United State
Modeling a description logic vocabulary for cancer research
AbstractThe National Cancer Institute has developed the NCI Thesaurus, a biomedical vocabulary for cancer research, covering terminology across a wide range of cancer research domains. A major design goal of the NCI Thesaurus is to facilitate translational research. We describe: the features of Ontylog, a description logic used to build NCI Thesaurus; our methodology for enhancing the terminology through collaboration between ontologists and domain experts, and for addressing certain real world challenges arising in modeling the Thesaurus; and finally, we describe the conversion of NCI Thesaurus from Ontylog into Web Ontology Language Lite. Ontylog has proven well suited for constructing big biomedical vocabularies. We have capitalized on the Ontylog constructs Kind and Role in the collaboration process described in this paper to facilitate communication between ontologists and domain experts. The artifacts and processes developed by NCI for collaboration may be useful in other biomedical terminology development efforts
Utility and applications of synoptic reporting in pathology
Background: Synoptic reports in routine pathology practice provide composite documents that include information from morphology and molecular technologies. It is clear and accurate structured information and developed by incorporating standardized data elements in the form of checklist for pathology reporting. This facilitates pathologists to document their findings and ultimately improve the overall quality of pathology reports.\ud
\ud
Objectives: The goal of this review article is to discuss (1) the importance of synoptic reporting in pathology, (2) utility and applications, (3) its impact on pathology reporting and patient care, and (4) the challenges and barriers of implementing synoptic reporting. Pertinent literature will also be reviewed.\ud
\ud
Design: The synoptic reporting system provides a complete set of data elements in the form of synoptic templates or “worksheets” for pathology tumor reporting based on the World Health Organization (WHO) Classification and the College of American Pathologists (CAP) Cancer Checklists. These standards provide most updated and supplemented classification scheme, specimen details, and staging as well as prognostic information. Data from synoptic reporting tool can be imported to a relational database where they are organized and efficiently searched and retrieved. Since search and retrieval are streamlined, synoptic databases enhance basic science, clinical, and translational cancer research.\ud
\ud
Conclusion: Synoptic reporting facilitates a standard based structured method for entering the diagnostic and prognostic information in accurate and consistent fashion for a particular pathology specimen, thus reducing transcription services, specimen turnaround time, and typographical and transcription errors. The structured data can be imported into the Laboratory Information Service (LIS) database, which facilitates swift data access and improved communication for cancer management. Finally, these synoptic templates act as a robust medium of high-quality data from the various biospecimens, which can be shared across multiple on-going research projects to enhance basic and translational research
Recommended from our members
Ontology-based Semantic Harmonization of HIV-associated Common Data Elements for Integration of Diverse HIV Research Datasets
Analysis of integrated, diverse, Human Immunodeficiency Virus (HIV)-associated datasets can increase knowledge and guide the development of novel and effective interventions for disease prevention and treatment by increasing breadth of variables and statistical power, particularly for sub-group analyses. This topic has been identified as a National Institutes of Health research priority, but few efforts have been made to integrate data across HIV studies. Our aims were to: 1) Characterize the semantic heterogeneity (SH) in the HIV research domain; 2) Identify HIV-associated common data elements (CDEs) in empirically generated and knowledge-based resources; 3) Create a formal representation of HIV-associated CDEs in the form of an HIV-associated Entities in Research Ontology (HERO); 4) Assess the feasibility of using HERO to semantically harmonize HIV research data. Our approach was guided by information/knowledge theory and the DIKW (Data Information Knowledge Wisdom) hierarchical model.
Our systematized review of the literature revealed that synergistic use of both ontologies and CDEs included integration, interoperability, data exchange, and data standardization. Moreover, methods and tools included use of experts for CDE identification, the Unified Medical Language System, natural language processing, Extensible Markup Language, Health Level 7, and ontology development tools (e.g., Protégé). Additionally, evaluation methods included expert assessment, quantification of mapping tasks between raters, assessment of interrater reliability, and comparison to established standards. We used these findings to inform our process for achieving the study aims.
For Aim 1, we analyzed eight disparate HIV-associated data dictionaries and developed a String Metric-assisted Assessment of Semantic Heterogeneity (SMASH) method, which aided identification of 127 (13%) homogeneous data element (DE) pairs and 1,048 (87%) semantically heterogeneous DE pairs. Most heterogeneous pairs (97%) were semantically-equivalent/syntactically-different, allowing us to determine that SH in the HIV research domain was high.
To achieve Aim 2, we used Clinicaltrials.gov, Google Search, and text mining in R to identify HIV-associated CDEs in HIV journal articles, HIV-associated datasets, AIDSinfo HIV/AIDS Glossary, AIDSinfo Drug Database, Logical Observation Identifiers Names and Codes (LOINC), Systematized Nomenclature of Medicine (SNOMED), and RxNORM (understood as prescription normalization). Two HIV experts then manually reviewed DEs from the journal articles and data dictionaries to confirm DE commonality and resolved semantic discrepancies through discussion. Ultimately, we identified 2,179 unique CDEs. Of all CDEs, data-driven approaches identified 2,055 (94%) (999 from the HIV/AIDS Glossary, 398 from the Drug Database, 91 from journal articles, and a total of 567 from LOINC, SNOMED, and RxNorm cumulatively). Expert-based approaches identified 124 (6%) unique CDEs from data dictionaries and confirmed the 91 CDEs from journal articles.
In Aim 3, we used the Protégé suite of ontology development tools and the 2,179 CDEs to develop the HERO. We modeled the ontology using the semantic structure of the Medical Entities Dictionary, available hierarchical information from the CDE knowledge resources, and expert knowledge. The ontology fulfilled most relevant criteria from Cimino’s desiderata and OntoClean ontology engineering principles, and it successfully answered eight competency questions.
Finally, for Aim 4, we assessed the feasibility of using HERO to semantically harmonize and integrate the data dictionaries from two diverse HIV-associated datasets. Two HIV experts involved in the development of HERO independently assessed each data dictionary. Of the 367 DEs in data dictionary 1 (D1), 181 (49.32%) were identified as CDEs and 186 (50.68%) were not CDEs, and of the 72 DEs in data dictionary 2 (D2), 37 (51.39%) were CDEs and 35 (48.61%) were not CDEs. The HIV experts then traversed HERO’s hierarchy to map CDEs from D1 and D2 to CDEs in HERO. Of the 181 CDEs in D1, 156 (86.19%) were found in HERO, and 25 (13.81%) were not. Similarly, of the 37 CDEs in D2 32 (86.48%) were found in HERO, and 5 (13.51%) were not. Interrater reliability for CDE identification as measured by Cohen’s Kappa was 0.900 for D1 and 0.892 for D2. Cohen’s Kappas for CDEs in D1 and D2 that were also identified in HERO were 0.885 and 0.688, respectively.
Subsequently, to demonstrate the integration of the two HIV-associated datasets, a sample of semantically harmonized CDEs in both datasets was categorically selected (e.g. administrative, demographic, and behavioral), and D2 sample size increases were calculated for race (e.g., White, African American/Black, Asian/Pacific Islander, Native American/Indian, and Hispanic/Latino) and for “intravenous drug use” from the integrated datasets. The average increase of D2 CDEs for six selected CDEs was 1,928%.
Despite the limitation of HERO developers also serving as evaluators, the contributions of the study to the fields of informatics and HIV research were substantial. Confirmatory contributions include: identification of effective CDE/ontology tools, and use of data-driven and expert-based methods. Novel contributions include: development of SMASH and HERO; and new contributions include documenting that SH is high in HIV-associated datasets, identifying 2,179 HIV-associated CDEs, creating two additional classifications of SH, and showing that using HERO for semantic harmonization of HIV-associated data dictionaries is feasible. Our future work will build upon this research by expanding the numbers and types of datasets, refining our methods and tools, and conducting an external evaluation
СУЧАСНІ КОМП’ЮТЕРНІ ГРІД - ТЕХНОЛОГІЇ ТА ЇХ ЗАСТОСУВАННЯ В МЕДИЧНИХ ДОСЛІДЖЕННЯХ
Issues of developing and applying of the newest perspective information direction – Grid-technology in medicine andbiology are considered. Its major feature is an opportunity of opening the way of transforming a global network of computersinto an integral practically unlimited computing resource which can have a crucial importance for development of medicineand biology.Grid is also defined as an universal infrastructure uniting computers common territorial – distributing system.The time leader on Grid creation networks in the world is the USA where since 2004, a strategic Grid – Program directedto the creation of integral national space for high-power calculations.In Europe since April, 2004 a big project ENABLING GRIDS FOR E-SCIENCE within the framework of which the all-European infrastructure based on Grid – technologies has been carrying out.Biomedicine is one of the directions, chosen in Europe for developing and implementing Grid – technologies. First of all,it concerns problems of creating databases of patient’s hereditary diseases. On the other hand, biomedical Grids are createdfor drawing up databases of various clinics with the purpose of creating a virtual hospital.Grid – medicine is a Grid infrastructure containing a specialized computer service, adapted for problems of processingbiomedical data. Accordingly, resources in Grid – medicine are computer resources, specialized bases of medical data,specialized medical devices and complexes.The first applicationsof Grid – technologies have shown the importance of Grid-computing paradigm for genomes researchesand processing of medical images, in particular in such areas as oncology, neurosurgery, radiotherapy.The key concept of Grid – technologies is creating a virtual organization – a group the users distributed territorially havingcommon aim and which will share their resources.Some examples of the created virtual laboratories and projects in area Grid – medicine are considered:Area 1. Medical graph and images processing.Area 2. Modeling a patient’s body for choosing treatment tactics and surgical intervention.Area 3. Grid – technologies in pharmacy.Area 4. Grid in genome to medicine.Area 5. Virtual biomedical universities and electronic training.For the first time a Program of information of the National Academy of Sciences within the framework of which a UkrainianNational Grid has been realized in Ukraine since 2005.In 2007, under the initiative of the Ministry of Education and Science of Ukraine National Grid– infrastructures for maintenanceof scientific researches and educations were created in Ukraine.The use of the firstly created Grid segment of the National Academy of Sciences and in perspective a national network willgive an opportunity to successfully integrate into the international scientific projects which are carried out in Europe and inother world centers of science. Undoubtedly, the development Grid – technologies and their implementation into practicalpublic health services, scientific researches and educational process will allow one to lead the level of training medicalstudents and medical specialists to the level of the best world standards.Рассмотрены вопросы разработ и и применения в медицине и биоло ии новейше о перспе тивно о информа-ционно о направления – Грид – техноло ии. Важнейшей ее чертой является возможность от рытия п ти преоб-разованию лобальной сети омпьютеров в единый, пра тичес и нео раниченный вычислительный омпьютер-ный рес рс, оторый может иметь решающее значение для развития медицины и биоло ии.Грид та же определяют а ниверсальн ю инфрастр т р , объединяющ ю омпьютеры и с пер омпьютеры водн общ ю территориально – распределенн ю систем .Без словным лидером по созданию Грид – сетей в мире являются США, де с 2004 ода реализ ется страте ичес аяГрид – про рамма, направленная на создание едино о национально о пространства для высо омощных вычислений.В Европе с апреля 2004 ода ос ществляется большой прое т ENABLING GRIDS FOR E-SCIENCE, в рам ах оторо-о создается общеевропейс ая инфрастр т ра, базир ющаяся на Грид-техноло иях.Биомедицина – одно из направлений, выбранное в Европе для разработ и и внедрения Грид-техноло ий. Вперв ю очередь это асается проблем создания баз данных наследственных заболеваний пациентов. С др ойстороны, биомедицинс ие Гриды создаются для составления баз данных различных лини с целью созданиявирт ально о оспиталя.Грид-медицина – это инфрастр т ра Грида, содержащая специализированный омпьютерный сервис, адапти-рованный для проблем обработ и биомедицинс их данных. Соответственно, рес рсами в Грид – медицине явля-ются омпьютерные рес рсы, специализированные базы медицинс их данных, специализированные медицинс-ие приборы и омпле сы.Уже первые применения Грид-техноло ий продемонстрировали важность паради мы Грид- омпьютин а для е-номних исследований и обработ и медицинс их изображений, в частности в та их областях, а он оло ия, ней-рохир р ия, радиотерапия.Ключевой онцепцией Грид-техноло ии является создание вирт альной ор анизации – р ппы распределенныхтерриториально пользователей, имеющих общ ю цель и оторые б д т делиться своими рес рсами.Рассмотрены не оторые примеры созданных вирт альных лабораторий и прое тов в области Грид – медицины:Область 1. Медицинс ая рафи а и обработ а изображений.Область 2. Моделирование тела пациента для выбора та ти и лечения и хир р ичес о о вмешательства.Область 3. Грид-техноло ии в фармации.Область 4. Грид в еномной медицине.Область 5. Вирт альные биомедицинс ие ниверситеты и эле тронное об чение.В У раине с 2005 ода выполняется про рамма информатизации Национальной а адемии на , в рам ах оторойвпервые создан У раинс ий национальный Грид.По инициативе Министерства образования и на и У раины в 2007 од объявлено о начале работ по созданиюобщенациональной Грид – инфрастр т ры для обеспечения на чных исследований и образования в У раине.Использование же созданно о перво о Грид-се мента Национальной а адемии на и в перспе тиве общенацио-нальной сети предоставит возможность спешно инте рироваться в межд народные на чные прое ты, оторые вы-полняются в Европе и в др их мировых на чных центрах. Несомненно, что развитие Грид – техноло ий и их внедре-ние в пра тичес ое здравоохранение, на чные исследования и образовательный процесс, позволит вывести ровеньпод отов и ст дентов-меди ов и медицинс их специалистов на ровень наил чших мировых стандартов.Розглянуті питання розробки та застосування в медицині та біології новітнього перспективного інформаційного напрямку – Грід-технології. Найважливішою рисою цієї технології є можливість відкриття шляху до перетворення глобальної мережі комп’ютерів в єдиний, практично необмежений обчислювальний комп’ютерний ресурс, що може мати вирішальне значення для розвитку медицини і біології