28 research outputs found

    A Vision for the Systematic Monitoring and Improvement of the Quality of Electronic Health Data

    Get PDF
    In parallel with the implementation of information and communications systems, health care organizations are beginning to amass large-scale repositories of clinical and administrative data. Many nations seek to leverage so-called Big Data repositories to support improvements in health outcomes, drug safety, health surveillance, and care delivery processes. An unsupported assumption is that electronic health care data are of sufficient quality to enable the varied use cases envisioned by health ministries. The reality is that many electronic health data sources are of suboptimal quality and unfit for particular uses. To more systematically define, characterize and improve electronic health data quality, we propose a novel framework for health data stewardship. The framework is adapted from prior data quality research outside of health, but it has been reshaped to apply a systems approach to data quality with an emphasis on health outcomes. The proposed framework is a beginning, not an end. We invite the biomedical informatics community to use and adapt the framework to improve health data quality and outcomes for populations in nations around the world

    Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data

    Get PDF
    Objectives Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and “off the shelf” tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries. Materials and methods We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Results Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model. Conclusion Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing “off the shelf” approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches

    DCMS: A data analytics and management system for molecular simulation

    Get PDF
    Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression

    Decision Support System For Geriatric Care

    Get PDF
    poster abstractGeriatrics is a branch in medicine that focuses on the healthcare of the elderly. We propose to build a decision support system for the elderly care based on a knowledgebase system that incorporates best practices that are reported in the literature. A Bayesian network model is then used for decision support for the geriatric care tool that we develop

    Identification of a gene signature in cell cycle pathway for breast cancer prognosis using gene expression profiling data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Numerous studies have used microarrays to identify gene signatures for predicting cancer patient clinical outcome and responses to chemotherapy. However, the potential impact of gene expression profiling in cancer diagnosis, prognosis and development of personalized treatment may not be fully exploited due to the lack of consensus gene signatures and poor understanding of the underlying molecular mechanisms.</p> <p>Methods</p> <p>We developed a novel approach to derive gene signatures for breast cancer prognosis in the context of known biological pathways. Using unsupervised methods, cancer patients were separated into distinct groups based on gene expression patterns in one of the following pathways: apoptosis, cell cycle, angiogenesis, metastasis, p53, DNA repair, and several receptor-mediated signaling pathways including chemokines, EGF, FGF, HIF, MAP kinase, JAK and NF-κB. The survival probabilities were then compared between the patient groups to determine if differential gene expression in a specific pathway is correlated with differential survival.</p> <p>Results</p> <p>Our results revealed expression of cell cycle genes is strongly predictive of breast cancer outcomes. We further confirmed this observation by building a cell cycle gene signature model using supervised methods. Validated in multiple independent datasets, the cell cycle gene signature is a more accurate predictor for breast cancer clinical outcome than the previously identified Amsterdam 70-gene signature that has been developed into a FDA approved clinical test MammaPrint<sup>®</sup>.</p> <p>Conclusion</p> <p>Taken together, the gene expression signature model we developed from well defined pathways is not only a consistently powerful prognosticator but also mechanistically linked to cancer biology. Our approach provides an alternative to the current methodology of identifying gene expression markers for cancer prognosis and drug responses using the whole genome gene expression data.</p

    Efficient indexing techniques for the update intensive environment

    No full text
    For environments such as moving object and sensor databases where data is constantly evolving, traditional database index structures usually suffer from the need for frequent updates and result in poor performance. We propose and develop new indexing and querying techniques for the update intensive environment. Our approaches exploit properties of the applications such as the nature of the queries and the nature of data changes. Update intensive applications usually require monitoring of continuously changing data. Queries in these applications tend to be continuous queries with answers reported at multiple points in time. We introduce the Query Index (QI) and Velocity Constraint Index (VCI) for efficient and scalable execution of multiple continuous queries. Our work also exploits the nature of changes in data. We address common and important classes of data including moving object data and constantly evolving numerical data such as sensor data. Based on the nature of data changes, we introduce four new index structures---Q+Rtree and Change-tolerant Rtree (CTRtree) are developed for indexing moving object data and Mean Variance Tree (MVTree) and Forecasted Interval Index (FI-Index) are developed for indexing constantly evolving numerical data. The design of these index structures is based on not only the current values of the data being indexed, but also the nature of changes of data values. This approach maximizes the opportunity for the index to cover the updated values and reduces the number of expensive updates to the index structures. Experimental results establish the superior performance of the proposed index structures over traditional indexes. We also introduce the notion of change-tolerant indexing and design indexes with the explicit goal of optimizing both the update and query performance. The indexes trade slightly poorer query performance for a much superior update performance resulting in better overall performance

    Change Tolerant Indexing for Constantly Evolving Data

    Get PDF
    Index structures are designed to optimize search performance, while at the same time supporting efficient data updates. Although not explicit, existing index structures are typically based upon the assumption that the rate of updates will be small compared to the rate of querying. This assumption is not valid in streaming data environments such as sensor and moving object databases, where updates are received incessantly. In fact, for many applications, the rate of updates may well exceed the rate of querying. In such environments, index structures suffer from poor performance due to the large overhead of keeping the index updated with the latest data. move in a well behaved, but restrictive manner (e.g. in straight lines with constant velocity). In this paper, we propose and develop an index structure that is explicitly designed to perform well for both querying and updating. We present techniques for altering the design of an index in order to optimize for both updates and querying. The paper is developed with the example of R-trees, but the ideas can be extended to other index structures as well. We present the design of the Change Tolerant R-tree, an experimental evaluation.

    Q+Rtree: Efficient Indexing for Moving Object Databases

    No full text
    Moving object environments contain large numbers of queries and continuously moving objects. Traditional spatial index structures do not work well in this environment because of the need to frequently update the index which results in very poor performance. In this paper, we present a novel indexing structure, namely the Q+Rtree, based on the observation that i) most moving objects are in quasi-static state most of time, and ii) the moving patterns of objects are strongly related to the topography of the space. The Q+Rtree is a hybrid tree structure which consists of both an R*tree and a QuadTree. The R*tree component indexes quasi-static objects -- those that are currently moving slowly and are often crowded together in buildings or houses. The Quadtree component indexes fast moving objects which are dispersed over wider regions. We also present the experimental evaluation of our approach
    corecore