Search CORE

28 research outputs found

A Vision for the Systematic Monitoring and Improvement of the Quality of Electronic Health Data

Author: Dixon Brian E.
Grannis Shaun J
Rosenman Marc
Xia Yuni
Publication venue
Publication date: 01/01/2013
Field of study

In parallel with the implementation of information and communications systems, health care organizations are beginning to amass large-scale repositories of clinical and administrative data. Many nations seek to leverage so-called Big Data repositories to support improvements in health outcomes, drug safety, health surveillance, and care delivery processes. An unsupported assumption is that electronic health care data are of sufficient quality to enable the varied use cases envisioned by health ministries. The reality is that many electronic health data sources are of suboptimal quality and unfit for particular uses. To more systematically define, characterize and improve electronic health data quality, we propose a novel framework for health data stewardship. The framework is adapted from prior data quality research outside of health, but it has been reshaped to apply a systems approach to data quality with an emphasis on health outcomes. The proposed framework is a beginning, not an end. We invite the biomedical informatics community to use and adapt the framework to improve health data quality and outcomes for populations in nations around the world

IUPUIScholarWorks

Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data

Author: Dixon Brian E.
Gichoya Judy
Grannis Shaun J.
Kasthurirathne Suranga N.
Mamlin Burke
Xia Yuni
Xu Huiping
Publication venue: 'Elsevier BV'
Publication date: 01/05/2017
Field of study

Objectives Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and “off the shelf” tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries. Materials and methods We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Results Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model. Conclusion Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing “off the shelf” approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches

IUPUIScholarWorks

DCMS: A data analytics and management system for molecular simulation

Author: Anand Kumar
Joseph C Fogarty
Meryem Berrada
Sagar A Pandit
Vladimir Grupcev
Xingquan Zhu
Yi-Cheng Tu
Yuni Xia
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression

IUPUIScholarWorks

Springer - Publisher Connector

PubMed Central

Decision Support System For Geriatric Care

Author: Bandos Jean
Geesaman Jerry
Jones Josette
Palakal Mathew
Pandit Yogesh
Pecenka Dave
Tinsley Eric
Xia Yuni
Publication venue: Office of the Vice Chancellor for Research
Publication date: 09/04/2010
Field of study

poster abstractGeriatrics is a branch in medicine that focuses on the healthcare of the elderly. We propose to build a decision support system for the elderly care based on a knowledgebase system that incorporates best practices that are reported in the literature. A Bayesian network model is then used for decision support for the geriatric care tool that we develop

IUPUIScholarWorks

Identification of a gene signature in cell cycle pathway for breast cancer prognosis using gene expression profiling data

Author: A Keith Dunker
A Potti
A Subramanian
A Urruticoechea
AH Bild
Andrew Campen
B Baldetorp
B Lloveras
C Fan
CM Perou
DW Hedley
E Baldini
FM Waldman
GS Eichler
H Kuhling
HY Chang
HY Chuang
J Massague
J Quackenbush
JD Mosley
Jiangang Liu
JK Lee
JM Bueno-de-Mesquita
JS Meyer
K Keyomarsi
KL Evans
KR Coombes
L Ein-Dor
LD Miller
LJ van't Veer
M Buyse
M Colozza
Mathew Palakal
MG Peters
MJ van de Vijver
PJ van Diest
R Clarke
R Tibshirani
S Han
S Paik
Sheng-Bin Peng
Shuguang Huang
Shuyu Li
T Sorlie
T Sorlie
T Suzuki
V Vuaroqueaux
Xiang Ye
XJ Ma
Y Hu
Y Pawitan
Y Wang
Yuni Xia
Publication venue: BioMed Central
Publication date: 01/09/2008
Field of study

Abstract Background Numerous studies have used microarrays to identify gene signatures for predicting cancer patient clinical outcome and responses to chemotherapy. However, the potential impact of gene expression profiling in cancer diagnosis, prognosis and development of personalized treatment may not be fully exploited due to the lack of consensus gene signatures and poor understanding of the underlying molecular mechanisms. Methods We developed a novel approach to derive gene signatures for breast cancer prognosis in the context of known biological pathways. Using unsupervised methods, cancer patients were separated into distinct groups based on gene expression patterns in one of the following pathways: apoptosis, cell cycle, angiogenesis, metastasis, p53, DNA repair, and several receptor-mediated signaling pathways including chemokines, EGF, FGF, HIF, MAP kinase, JAK and NF-κB. The survival probabilities were then compared between the patient groups to determine if differential gene expression in a specific pathway is correlated with differential survival. Results Our results revealed expression of cell cycle genes is strongly predictive of breast cancer outcomes. We further confirmed this observation by building a cell cycle gene signature model using supervised methods. Validated in multiple independent datasets, the cell cycle gene signature is a more accurate predictor for breast cancer clinical outcome than the previously identified Amsterdam 70-gene signature that has been developed into a FDA approved clinical test MammaPrint®. Conclusion Taken together, the gene expression signature model we developed from well defined pathways is not only a consistently powerful prognosticator but also mechanistically linked to cancer biology. Our approach provides an alternative to the current methodology of identifying gene expression markers for cancer prognosis and drug responses using the whole genome gene expression data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Efficient indexing techniques for the update intensive environment

Author: Xia Yuni
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2005
Field of study

For environments such as moving object and sensor databases where data is constantly evolving, traditional database index structures usually suffer from the need for frequent updates and result in poor performance. We propose and develop new indexing and querying techniques for the update intensive environment. Our approaches exploit properties of the applications such as the nature of the queries and the nature of data changes. Update intensive applications usually require monitoring of continuously changing data. Queries in these applications tend to be continuous queries with answers reported at multiple points in time. We introduce the Query Index (QI) and Velocity Constraint Index (VCI) for efficient and scalable execution of multiple continuous queries. Our work also exploits the nature of changes in data. We address common and important classes of data including moving object data and constantly evolving numerical data such as sensor data. Based on the nature of data changes, we introduce four new index structures---Q+Rtree and Change-tolerant Rtree (CTRtree) are developed for indexing moving object data and Mean Variance Tree (MVTree) and Forecasted Interval Index (FI-Index) are developed for indexing constantly evolving numerical data. The design of these index structures is based on not only the current values of the data being indexed, but also the nature of changes of data values. This approach maximizes the opportunity for the index to cover the updated values and reduces the number of expensive updates to the index structures. Experimental results establish the superior performance of the proposed index structures over traditional indexes. We also introduce the notion of change-tolerant indexing and design indexes with the explicit goal of optimizing both the update and query performance. The indexes trade slightly poorer query performance for a much superior update performance resulting in better overall performance

Purdue E-Pubs

Change Tolerant Indexing for Constantly Evolving Data

Author: Cheng Reynold
Prabhakar Sunil
Shah Rahul
Xia Yuni
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2004
Field of study

Index structures are designed to optimize search performance, while at the same time supporting efficient data updates. Although not explicit, existing index structures are typically based upon the assumption that the rate of updates will be small compared to the rate of querying. This assumption is not valid in streaming data environments such as sensor and moving object databases, where updates are received incessantly. In fact, for many applications, the rate of updates may well exceed the rate of querying. In such environments, index structures suffer from poor performance due to the large overhead of keeping the index updated with the latest data. move in a well behaved, but restrictive manner (e.g. in straight lines with constant velocity). In this paper, we propose and develop an index structure that is explicitly designed to perform well for both querying and updating. We present techniques for altering the design of an index in order to optimize for both updates and querying. The paper is developed with the example of R-trees, but the ideas can be extended to other index structures as well. We present the design of the Change Tolerant R-tree, an experimental evaluation.

CiteSeerX

Purdue E-Pubs

Q+Rtree: Efficient Indexing for Moving Object Databases

Author: Yuni Xia Sunil
Publication venue
Publication date
Field of study

Moving object environments contain large numbers of queries and continuously moving objects. Traditional spatial index structures do not work well in this environment because of the need to frequently update the index which results in very poor performance. In this paper, we present a novel indexing structure, namely the Q+Rtree, based on the observation that i) most moving objects are in quasi-static state most of time, and ii) the moving patterns of objects are strongly related to the topography of the space. The Q+Rtree is a hybrid tree structure which consists of both an R*tree and a QuadTree. The R*tree component indexes quasi-static objects -- those that are currently moving slowly and are often crowded together in buildings or houses. The Quadtree component indexes fast moving objects which are dispersed over wider regions. We also present the experimental evaluation of our approach

CiteSeerX