418 research outputs found

    Building a semantically annotated corpus of clinical texts

    Get PDF
    In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

    Service-oriented subscription management of medical decision data in the intensive care unit

    Get PDF
    Objectives: This paper addresses the design of a platform for the management of medical decision data in the ICU. Whenever new medical data from laboratories of monitors is available or at fixed times, the appropriate medical support services are activated and generate a medical alert or suggestion to the bedside terminal, the physician's PDA, smart phone or mailbox. Since future ICU systems will rely ever more on medical decision support, a generic and flexible subscription platform is of high importance. Methods: Our platform is designed based on the principles of service-oriented architectures, and is fundamental for service deployment since the medical support services only need to implement their algorithm and can rely on the platform for general functionalities. A secure communication and execution environment are also provided. Results: A prototype, where medical support services can be easily plugged in, has been implemented using Web service technology and is currently being evaluated by the Department of Intensive Cafe of the Ghent University Hospital. To illustrate the platform operation and performance, two prototype medical support services are used, showing that the extra response time introduced by the platform is less than 150 ms. Conclusions: The platform allows for easy integration with hospital information systems. The platform is generic and offers user-friendly patient/service subscription, transparent data and service resource management and priority-based filtering of messages. The performance has been evaluated and it was shown that the response time of platform components is negligible compared to the execution time of the medical support services

    BEAT: Bioinformatics Exon Array Tool to store, analyze and visualize Affymetrix GeneChip Human Exon Array data from disease experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is known from recent studies that more than 90% of human multi-exon genes are subject to Alternative Splicing (AS), a key molecular mechanism in which multiple transcripts may be generated from a single gene. It is widely recognized that a breakdown in AS mechanisms plays an important role in cellular differentiation and pathologies. Polymerase Chain Reactions, microarrays and sequencing technologies have been applied to the study of transcript diversity arising from alternative expression. Last generation Affymetrix GeneChip Human Exon 1.0 ST Arrays offer a more detailed view of the gene expression profile providing information on the AS patterns. The exon array technology, with more than five million data points, can detect approximately one million exons, and it allows performing analyses at both gene and exon level. In this paper we describe BEAT, an integrated user-friendly bioinformatics framework to store, analyze and visualize exon arrays datasets. It combines a data warehouse approach with some rigorous statistical methods for assessing the AS of genes involved in diseases. Meta statistics are proposed as a novel approach to explore the analysis results. BEAT is available at <url>http://beat.ba.itb.cnr.it</url>.</p> <p>Results</p> <p>BEAT is a web tool which allows uploading and analyzing exon array datasets using standard statistical methods and an easy-to-use graphical web front-end. BEAT has been tested on a dataset with 173 samples and tuned using new datasets of exon array experiments from 28 colorectal cancer and 26 renal cell cancer samples produced at the Medical Genetics Unit of IRCCS Casa Sollievo della Sofferenza.</p> <p>To highlight all possible AS events, alternative names, accession Ids, Gene Ontology terms and biochemical pathways annotations are integrated with exon and gene level expression plots. The user can customize the results choosing custom thresholds for the statistical parameters and exploiting the available clinical data of the samples for a multivariate AS analysis.</p> <p>Conclusions</p> <p>Despite exon array chips being widely used for transcriptomics studies, there is a lack of analysis tools offering advanced statistical features and requiring no programming knowledge. BEAT provides a user-friendly platform for a comprehensive study of AS events in human diseases, displaying the analysis results with easily interpretable and interactive tables and graphics.</p

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Design and development of a comprehensive data management platform for cytomics: cytomicsDB

    Get PDF
    In Cytomics environment, scientist has to continuosly deal with a large volume of structured and unstructured data, this condition in particular, makes a challenge the interoperability for any platform developed for cytomics. CytomicsDB approach is an effort for developing a framework which takes care of the standardization of the unstructured data, providing a common data model layer for HTS experiments. This model as well is suitable for the integration with other systems in Cytomics, in special other repositories, which allow the validation of key metadata used in the experiments, thus ensure reliability of the data stored. Other possible solutions for cytomics data management, should take special care in the use of data model standards for enhancing the collaboration and data sharing in the scientific community.BAPE Erasmus Mundus ProgramComputer Systems, Imagery and Medi

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    The Secondary Use of Longitudinal Critical Care Data

    Get PDF
    Aims To examine the strengths and limitations of a novel United Kingdom (UK) critical care data resource that repurposes routinely collected physiological data for research. Exemplar clinical research studies will be developed to explore the unique longitudinal nature of the resource. Objectives - To evaluate the suitability of the National Institute for Health Research (NIHR) Critical Care theme of the Health Informatics Collaborative (CCHIC) data model as a representation of the Electronic Health Record (EHR) for secondary research use. - To conduct a data quality evaluation of data stored within the CC-HIC research database. - To use the CC-HIC research database to conduct two clinical research studies that make use of the longitudinal data supported by the CC-HIC: - The association between cumulative exposure to excess oxygen and outcomes in the critically ill. - The association between different morphologies of longitudinal physiology—in particular organ dysfunction—and outcomes in sepsis. The CC-HIC The EHR is now routinely used for the delivery of patient care throughout the United Kingdom (UK). This has presented the opportunity to learn from a large volume of routinely collected data. The CC-HIC data model represents 255 distinct clinical concepts including demographics, outcomes and granular longitudinal physiology. This model is used to harmonise EHR data of 12 contributing Intensive Care Units (ICUs). This thesis evaluates the suitability of the CC-HIC data model in this role and the quality of data within. While representing an important first step in this field, the CC-HIC data model lacks the necessary normalisation and semantic expressivity to excel in this role. The quality of the CC-HIC research database was variable between contributing sites. High levels of missing data, missing meta-data, non-standardised units and temporal drop out of submitted data are amongst the most challenging features to tackle. It is the principal finding of this thesis that the CC-HIC should transition towards implementing internationally agreed standards for interoperability. Exemplar Clinical Studies Two exemplar studies are presented, each designed to make use of the longitudinal data made available by the CC-HIC and address domains that are both contemporaneous and of importance to the critical care community. Exposure to Excess Oxygen Longitudinal data from the CC-HIC cohort were used to explore the association between the cumulative exposure to excess oxygen and outcomes in the critically ill. A small (likely less than 1% absolute risk reduction) dose-independent association was found between exposure to excess oxygen and mortality. The lack of dosedependency challenges a causal interpretation of these findings. Physiological Morphologies in Sepsis The joint modelling paradigm was applied to explore the different longitudinal profiles of organ failure in sepsis, while accounting for informative censoring from patient death. The rate of change of organ failure was found to play a more significan't role in outcomes than the absolute value of organ failure at a given moment. This has important implications for how the critical care community views the evolution of physiology in sepsis. DECOVID The Decoding COVID-19 (DECOVID) project is presented as future work. DECOVID is a collaborative data sharing project that pools clinical data from two large NHS trusts in England. Many of the lessons learnt from the prior work with the CC-HIC fed into the development of the DECOVID data model and its quality evaluation

    Knowledge Management Approaches for predicting Biomarker and Assessing its Impact on Clinical Trials

    Get PDF
    The recent success of companion diagnostics along with the increasing regulatory pressure for better identification of the target population has created an unprecedented incentive for the drug discovery companies to invest into novel strategies for stratified biomarker discovery. Catching with this trend, trials with stratified biomarker in drug development have quadrupled in the last decade but represent a small part of all Interventional trials reflecting multiple co-developmental challenges of therapeutic compounds and companion diagnostics. To overcome the challenge, varied knowledge management and system biology approaches are adopted in the clinics to analyze/interpret an ever increasing collection of OMICS data. By semi-automatic screening of more than 150,000 trials, we filtered trials with stratified biomarker to analyse their therapeutic focus, major drivers and elucidated the impact of stratified biomarker programs on trial duration and completion. The analysis clearly shows that cancer is the major focus for trials with stratified biomarker. But targeted therapies in cancer require more accurate stratification of patient population. This can be augmented by a fresh approach of selecting a new class of biomolecules i.e. miRNA as candidate stratification biomarker. miRNA plays an important role in tumorgenesis in regulating expression of oncogenes and tumor suppressors; thus affecting cell proliferation, differentiation, apoptosis, invasion, angiogenesis. miRNAs are potential biomarkers in different cancer. However, the relationship between response of cancer patients towards targeted therapy and resulting modifications of the miRNA transcriptome in pathway regulation is poorly understood. With ever-increasing pathways and miRNA-mRNA interaction databases, freely available mRNA and miRNA expression data in multiple cancer therapy have created an unprecedented opportunity to decipher the role of miRNAs in early prediction of therapeutic efficacy in diseases. We present a novel SMARTmiR algorithm to predict the role of miRNA as therapeutic biomarker for an anti-EGFR monoclonal antibody i.e. cetuximab treatment in colorectal cancer. The application of an optimised and fully automated version of the algorithm has the potential to be used as clinical decision support tool. Moreover this research will also provide a comprehensive and valuable knowledge map demonstrating functional bimolecular interactions in colorectal cancer to scientific community. This research also detected seven miRNA i.e. hsa-miR-145, has-miR-27a, has- miR-155, hsa-miR-182, hsa-miR-15a, hsa-miR-96 and hsa-miR-106a as top stratified biomarker candidate for cetuximab therapy in CRC which were not reported previously. Finally a prospective plan on future scenario of biomarker research in cancer drug development has been drawn focusing to reduce the risk of most expensive phase III drug failures
    • …
    corecore