15,964 research outputs found

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

    Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research.

    Get PDF
    BackgroundJuvenile idiopathic arthritis is the most common rheumatic disease in children. Chronic uveitis is a common and serious comorbid condition of juvenile idiopathic arthritis, with insidious presentation and potential to cause blindness. Knowledge of clinical associations will improve risk stratification. Based on clinical observation, we hypothesized that allergic conditions are associated with chronic uveitis in juvenile idiopathic arthritis patients.MethodsThis study is a retrospective cohort study using Stanford's clinical data warehouse containing data from Lucile Packard Children's Hospital from 2000-2011 to analyze patient characteristics associated with chronic uveitis in a large juvenile idiopathic arthritis cohort. Clinical notes in patients under 16 years of age were processed via a validated text analytics pipeline. Bivariate-associated variables were used in a multivariate logistic regression adjusted for age, gender, and race. Previously reported associations were evaluated to validate our methods. The main outcome measure was presence of terms indicating allergy or allergy medications use overrepresented in juvenile idiopathic arthritis patients with chronic uveitis. Residual text features were then used in unsupervised hierarchical clustering to compare clinical text similarity between patients with and without uveitis.ResultsPreviously reported associations with uveitis in juvenile idiopathic arthritis patients (earlier age at arthritis diagnosis, oligoarticular-onset disease, antinuclear antibody status, history of psoriasis) were reproduced in our study. Use of allergy medications and terms describing allergic conditions were independently associated with chronic uveitis. The association with allergy drugs when adjusted for known associations remained significant (OR 2.54, 95% CI 1.22-5.4).ConclusionsThis study shows the potential of using a validated text analytics pipeline on clinical data warehouses to examine practice-based evidence for evaluating hypotheses formed during patient care. Our study reproduces four known associations with uveitis development in juvenile idiopathic arthritis patients, and reports a new association between allergic conditions and chronic uveitis in juvenile idiopathic arthritis patients

    A model for Business Intelligence Systems’ Development

    Get PDF
    Often, Business Intelligence Systems (BIS) require historical data or data collected from var-ious sources. The solution is found in data warehouses, which are the main technology used to extract, transform, load and store data in the organizational Business Intelligence projects. The development cycle of a data warehouse involves lots of resources, time, high costs and above all, it is built only for some specific tasks. In this paper, we’ll present some of the aspects of the BI systems’ development such as: architecture, lifecycle, modeling techniques and finally, some evaluation criteria for the system’s performance.BIS (Business Intelligence Systems), Data Warehouses, OLAP (On-Line Analytical Processing), Object-Oriented Modeling

    Building XML data warehouse based on frequent patterns in user queries

    Get PDF
    [Abstract]: With the proliferation of XML-based data sources available across the Internet, it is increasingly important to provide users with a data warehouse of XML data sources to facilitate decision-making processes. Due to the extremely large amount of XML data available on web, unguided warehousing of XML data turns out to be highly costly and usually cannot well accommodate the users’ needs in XML data acquirement. In this paper, we propose an approach to materialize XML data warehouses based on frequent query patterns discovered from historical queries issued by users. The schemas of integrated XML documents in the warehouse are built using these frequent query patterns represented as Frequent Query Pattern Trees (FreqQPTs). Using hierarchical clustering technique, the integration approach in the data warehouse is flexible with respect to obtaining and maintaining XML documents. Experiments show that the overall processing of the same queries issued against the global schema become much efficient by using the XML data warehouse built than by directly searching the multiple data sources

    Association Rules Mining Based Clinical Observations

    Full text link
    Healthcare institutes enrich the repository of patients' disease related information in an increasing manner which could have been more useful by carrying out relational analysis. Data mining algorithms are proven to be quite useful in exploring useful correlations from larger data repositories. In this paper we have implemented Association Rules mining based a novel idea for finding co-occurrences of diseases carried by a patient using the healthcare repository. We have developed a system-prototype for Clinical State Correlation Prediction (CSCP) which extracts data from patients' healthcare database, transforms the OLTP data into a Data Warehouse by generating association rules. The CSCP system helps reveal relations among the diseases. The CSCP system predicts the correlation(s) among primary disease (the disease for which the patient visits the doctor) and secondary disease/s (which is/are other associated disease/s carried by the same patient having the primary disease).Comment: 5 pages, MEDINFO 2010, C. Safran et al. (Eds.), IOS Pres

    Can the US Minimum Data Set Be Used for Predicting Admissions to Acute Care Facilities?

    Get PDF
    This paper is intended to give an overview of Knowledge Discovery in Large Datasets (KDD) and data mining applications in healthcare particularly as related to the Minimum Data Set, a resident assessment tool which is used in US long-term care facilities. The US Health Care Finance Administration, which mandates the use of this tool, has accumulated massive warehouses of MDS data. The pressure in healthcare to increase efficiency and effectiveness while improving patient outcomes requires that we find new ways to harness these vast resources. The intent of this preliminary study design paper is to discuss the development of an approach which utilizes the MDS, in conjunction with KDD and classification algorithms, in an attempt to predict admission from a long-term care facility to an acute care facility. The use of acute care services by long term care residents is a negative outcome, potentially avoidable, and expensive. The value of the MDS warehouse can be realized by the use of the stored data in ways that can improve patient outcomes and avoid the use of expensive acute care services. This study, when completed, will test whether the MDS warehouse can be used to describe patient outcomes and possibly be of predictive value
    • 

    corecore