3 research outputs found

    Schema Matching for Large-Scale Data Based on Ontology Clustering Method

    Get PDF
    Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies, has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an f-measure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation

    Use of statistical analysis, data mining, decision analysis and cost effectiveness analysis to analyze medical data : application to comparative effectiveness of lumpectomy and mastectomy for breast cancer.

    Get PDF
    Statistical models have been the first choice for comparative effectiveness in clinical research. Though effective, these models are limited when the data to be analyzed do not fit the assumed distributions; which is mostly the case when the study is not a clinical trial. In this project, data mining, decision analysis and cost effectiveness analysis methods were used to supplement statistical models in comparing lumpectomy to mastectomy for surgical treatment of breast cancer. Mastectomy has been the gold standard for breast cancer treatment for since the 1800s. In the 20th century, an equivalence of mastectomy and lumpectomy was established in terms of long-term survival and disease free survival. However, short term comparative effectiveness in post-operative outcomes has not been fully explored. Studies using administrative data are lacking and no study has used new technologies of self-expression, particularly the internet discussion board. In this study, data used were from the Nationwide Inpatient Sample (NIS) 2005, the Thomson Reuter\u27s MarketScan 2000 - 2001, the medical literature on clinical trials and online individuals\u27 posts in discussion boards on breastcancer.org. The NIS was used to compare lumpectomy to mastectomy in terms of hospital length of stay, total charges and in-hospital death at the time of surgery. MarketScan data was used to evaluate the comparative follow-up outcomes in terms of risk of repeat hospitalization, risk of repeat operation, number of outpatient services, number of prescribed medications, length of stay, and total charges per post-operative hospital admission on a period of eight months average. The MarketScan was also used to construct a simple post-operative hospital admission predictive model and to perform short-term cost-effectiveness analysis. The medical literature was used to analyze long term -10 years- mortality and recurrence for both treatments. The web postings were used to evaluate the comparative cost to improve quality of life in terms of patient satisfaction. In NIS and MarketScan data, International Classification of Disease, 9th revision, Clinical Modification (lCD-9-CM) diagnosis codes were used to extract cases of breast cancer; and ICD-9-CM procedure codes and Current Procedural Terminology, 4th edition procedure codes were used to form groups of treatment. Data were pre-processed and prepared for analysis using data mining techniques such as clustering, sampling and text mining. To clean the data for statistical models, some continuous variables were normalized using methods such as logarithmic transformation. Statistical models such as linear regression, generalized linear models, logistic and proportional hazard (Cox) regressions were used to compare post-operative outcomes of lumpectomy versus mastectomy. Neural networks, decision tree and logistic regression predictive modeling techniques were compared to create a simple predictive model predicting 90-day post-operative hospital re-admission. Cost and effectiveness were compared with the Incremental Cost Effectiveness Ratio (ICER). A simple method to process and analyze online po stings was created and used for patients\u27 input in the comparison of lumpectomy to mastectomy. All statistical analyses were performed in SAS 9.2. Data Mining was performed in SAS Enterprise Miner (EM) 6.1 and SAS Text Miner. Decision analysis and Cost Effectiveness Analysis were performed in TreeAge Pro 2011. A simple comparison of the two procedures using the NIS 2005, a discharge-level data, showed that in general, a lumpectomy surgery is associated with a significantly longer stay and more charges on average. From the MarketScan data, a person-level data where a patient can be followed longitudinally, it was found that for the initial hospitalization, patients who underwent mastectomy had a non-significant longer hospital stay and significantly lower charges. The post-operative number of outpatient services, prescribed medications as well as length of stay and charges for post-operative hospital admissions were not statistically significant. Using the MarketScan data, it was also found that the best model to predict 90-day post-operative hospital admission was logistic regression. A logistic regression revealed that the risk of a hospital re-admission within 90 days after surgery was 65% for a patient who underwent lumpectomy and 48% for a patient who underwent mastectomy. A cost effectiveness analysis using Markov models for up to 100 days after surgery showed that having lumpectomy saved hospital related costs every day with a minimum saving of 33onday10.Intermsoflong−termoutcomes,theuseofdecisionanalysismethodsontheliteraturereviewdatarevealedthat,10−yearsaftersurgery,739recurrencesand84deathswerepreventedamong10,000womenwhohadmastectomyinsteadoflumpectomy.Factoringpatients2˘7preferencesinthecomparisonofthetwoprocedures,itwasfoundthatpatientswhoundergolumpectomyarenon−significantlymoresatisfiedthantheirpeerswhoundergomastectomy.Intermsofcost,itwasfoundthatlumpectomysaves33 on day 10. In terms of long-term outcomes, the use of decision analysis methods on the literature review data revealed that, 10-years after surgery, 739 recurrences and 84 deaths were prevented among 10,000 women who had mastectomy instead of lumpectomy. Factoring patients\u27 preferences in the comparison of the two procedures, it was found that patients who undergo lumpectomy are non-significantly more satisfied than their peers who undergo mastectomy. In terms of cost, it was found that lumpectomy saves 517 for each satisfied individual in comparison to mastectomy. In conclusion, the current project showed how to use data mining, decision analysis and cost effectiveness methods to supplement statistical analysis when using real world nonclinical trial data for a more complete analysis. The application of this combination of methods on the comparative effectiveness of lumpectomy and mastectomy showed that in terms of cost and patients\u27 quality of life measured as satisfaction, lumpectomy was found to be the better choice
    corecore