2 research outputs found

    The Applicability Of Data Mining Techniques In Eurostat Databases: An Example Of The Decision Tree

    Get PDF
    As a result of the rapid development of technology, infrastructure that is used for obtaining and storing data has also been developed continuously. Besides that, importance of knowledge as an indispensable element for individuals and institutions has increased each passing day. However, former data management techniques have become insufficient for mass of data which increases rapidly. Therefore, there was a need for new methods. Data mining is a field that emerges variety of data extraction techniques to meet these requirements. Eurostat is a statistical office of European Union. Its purpose is to serve objective and accurate data to decision makers. These statistics are open for everyone to use. Although Eurostat databases are very comprehensive and useful, it is particularly difficult to find academic publications related to data mining. In this study, it is intended to do data mining study using the statistics that provided by Eurostat. In order to accomplish the analysis, “Information Society” field was selected and the data was analyzed with using a decision tree algorithm of data mining. In the end, the analysis results were presented. It is also intended to shed some light to the next studies

    Doctor of Philosophy

    Get PDF
    dissertationIn its report To Err is Human, The Institute of Medicine recommended the implementation of internal and external voluntary and mandatory automatic reporting systems to increase detection of adverse events. Knowledge Discovery in Databases (KDD) allows the detection of patterns and trends that would be hidden or less detectable if analyzed by conventional methods. The objective of this study was to examine novel KDD techniques used by other disciplines to create predictive models using healthcare data and validate the results through clinical domain expertise and performance measures. Patient records for the present study were extracted from the enterprise data warehouse (EDW) from Intermountain Healthcare. Patients with reported adverse events were identified from ICD9 codes. A clinical classification of the ICD9 codes was developed, and the clinical categories were analyzed for risk factors for adverse events including adverse drug events. Pharmacy data were categorized and used for detection of drugs administered in temporal sequence with antidote drugs. Data sampling and data boosting algorithms were used as signal amplification techniques. Decision trees, Naïve Bayes, Canonical Correlation Analysis, and Sequence Analysis were used as machine learning algorithms. iv Performance measures of the classification algorithms demonstrated statistically significant improvement after the transformation of the dataset through KDD techniques, data boosting and sampling. Domain expertise was applied to validate clinical significance of the results. KDD methodologies were applied successfully to a complex clinical dataset. The use of these methodologies was empirically proven effective in healthcare data through statistically significant measures and clinical validation. Although more research is required, we demonstrated the usefulness of KDD methodologies in knowledge extraction from complex clinical data
    corecore