1,153 research outputs found

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Association Rules Mining Based Clinical Observations

    Full text link
    Healthcare institutes enrich the repository of patients' disease related information in an increasing manner which could have been more useful by carrying out relational analysis. Data mining algorithms are proven to be quite useful in exploring useful correlations from larger data repositories. In this paper we have implemented Association Rules mining based a novel idea for finding co-occurrences of diseases carried by a patient using the healthcare repository. We have developed a system-prototype for Clinical State Correlation Prediction (CSCP) which extracts data from patients' healthcare database, transforms the OLTP data into a Data Warehouse by generating association rules. The CSCP system helps reveal relations among the diseases. The CSCP system predicts the correlation(s) among primary disease (the disease for which the patient visits the doctor) and secondary disease/s (which is/are other associated disease/s carried by the same patient having the primary disease).Comment: 5 pages, MEDINFO 2010, C. Safran et al. (Eds.), IOS Pres

    Gen verileri üzerinde ilginçlik ölçütleri kullanılarak birliktelik kuralları madenciliğinin uygulanması

    Get PDF
    Aim: Data mining is the discovery process of beneficial information, not revealed from large-scale data beforehand. One of the fields in which data mining is widely used is health. With data mining, the diagnosis and treatment of the disease and the risk factors affecting the disease can be determined quickly. Association rules are one of the data mining techniques. The aim of this study is to determine patient profiles by obtaining strong association rules with the apriori algorithm, which is one of the association rule algorithms. Material and Method: The data set used in the study consists of 205 acute myocardial infarction (AMI) patients. The patients have also carried the genotype of the FNDC5 (rs3480, rs726344, rs16835198) polymorphisms. Support and confidence measures are used to evaluate the rules obtained in the Apriori algorithm. The rules obtained by these measures are correct but not strong. Therefore, interest measures are used, besides two basic measures, with the aim of obtaining stronger rules. In this study For reaching stronger rules, interest measures lift, conviction, certainty factor, cosine, phi and mutual information are applied. Results: In this study, 108 rules were obtained. The proposed interest measures were implemented to reach stronger rules and as a result 29 of the rules were qualified as strong. Conclusion: As a result, stronger rules have been obtained with the use of interest measures in the clinical decision making process. Thanks to the strong rules obtained, it will facilitate the patient profile determination and clinical decision-making process of AMI patients.Amaç: Veri madenciliği, önceden büyük ölçekli verilerden ortaya çıkarılmayan faydalı bilgilerin keşfedilme sürecidir. Veri madenciliğinin yaygın olarak kullanıldığı alanlardan biri de sağlıktır. Veri madenciliği ile hastalığın tanı ve tedavisi ile hastalığı etkileyen risk faktörleri hızlı bir şekilde belirlenebilmektedir. Birliktelik kuralları, veri madenciliği tekniklerinden biridir. Bu çalışmanın amacı, birliktelik kuralı algoritmalarından biri olan apriori algoritması ile güçlü birliktelik kuralları elde ederek hasta profillerini belirlemektir. Materyal ve Metot: Çalışmada kullanılan veri seti 205 akut miyokard enfarktüsü (AMI) hastasından oluşmaktadır. Hastalar ayrıca FNDC5 polimorfizmlerinin rs3480, rs726344, rs16835198 genotipini de taşımaktadır. Apriori algoritması ile elde edilen kuralları değerlendirmek için destek ve güven ölçüleri kullanılır. Ancak bu ölçütler ile elde edilen kurallar doğrudur ancak güçlü değildir. Bu nedenle, daha güçlü kurallar elde etmek amacıyla iki temel ölçütün yanı sıra ilginçlik ölçütleri kullanılmaktadır. Bu çalışmada daha güçlü kurallara ulaşmak için ilginçlik ölçütlerinden kaldıraç, kanaat, kesinlik faktörü, cosine, korelasyon katsayısı (phi) ve karşılıklı bilgi ölçütleri uygulanmıştır. Bulgular: Çalışmada 108 kural elde edilmiştir. Bu kurallara ilginçlik ölçütlerinin de uygulanması ile elde edilen kural sayısı 29 olmuştur ve bu kurallar güçlü kural olarak nitelendirilmiştir. Sonuç: Sonuç olarak, klinik karar verme sürecinde ilginçlik ölçütlerinin kullanılmasıyla daha güçlü kurallar elde edilmiştir. Elde edilen güçlü kurallar sayesinde AMİ hastalarının hasta profili belirleme ve klinik karar verme sürecini kolaylaştıracaktır

    Prediction of peptides binding to MHC class I alleles by partial periodic pattern mining

    Get PDF
    MHC (Major Histocompatibility Complex) is a key player in the immune response of an organism. It is important to be able to predict which antigenic peptides will bind to a specific MHC allele and which will not, creating possibilities for controlling immune response and for the applications of immunotherapy. However, a problem for MHC class I is the presence of bulges and loops in the peptides, changing the total length. Most machine learning methods in use today require the sequences to be of same length to successfully mine the binding motifs. We propose the use of time-based data mining methods in motif mining to be able to mine motifs position-independently. Also, the information for both binding and non-binding peptides is used on the contrary to the other methods which only rely on binding peptides. The prediction results are between 60-95% for the tested alleles

    Analysis of medical opinions about the nonrealization of autopsies in a Mexican hospital using association rules and bayesian networks

    Get PDF
    This research identifies the factors influencing the reduction of autopsies in a hospital of Veracruz. The study is based on the application of data mining techniques such as association rules and Bayesian networks in data sets obtained from opinions of physicians. We analyzed, for the exploration and extraction of the knowledge, algorithms like Apriori, FPGrowth, PredictiveApriori, Tertius, J48, NaiveBayes, MultilayerPerceptron, and BayesNet, all of them provided by the API of WEKA. To generate mining models and present the new knowledge in natural language, we also developed a web application. The results presented in this study are those obtained from the best-evaluated algorithms, which have been validated by specialists in the field of patholog

    Doctor of Philosophy

    Get PDF
    dissertationWith the growing national dissemination of the electronic health record (EHR), there are expectations that the public will benefit from biomedical research and discovery enabled by electronic health data. Clinical data are needed for many diseases and conditions to meet the demands of rapidly advancing genomic and proteomic research. Many biomedical research advancements require rapid access to clinical data as well as broad population coverage. A fundamental issue in the secondary use of clinical data for scientific research is the identification of study cohorts of individuals with a disease or medical condition of interest. The problem addressed in this work is the need for generalized, efficient methods to identify cohorts in the EHR for use in biomedical research. To approach this problem, an associative classification framework was designed with the goal of accurate and rapid identification of cases for biomedical research: (1) a set of exemplars for a given medical condition are presented to the framework, (2) a predictive rule set comprised of EHR attributes is generated by the framework, and (3) the rule set is applied to the EHR to identify additional patients that may have the specified condition. iv Based on this functionality, the approach was termed the ‘cohort amplification' framework. The development and evaluation of the cohort amplification framework are the subject of this dissertation. An overview of the framework design is presented. Improvements to some standard associative classification methods are described and validated. A qualitative evaluation of predictive rules to identify diabetes cases and a study of the accuracy of identification of asthma cases in the EHR using frameworkgenerated prediction rules are reported. The framework demonstrated accurate and reliable rules to identify diabetes and asthma cases in the EHR and contributed to methods for identification of biomedical research cohorts

    Use association rules to study the relation between variables that affect high blood pressure

    Get PDF
    Introduction :Due to the increasing development of industrial societies, special diseases were spread in this regard, particularly in Iran, because of improper life style of eating and physical activity, the prevalence of these diseases is high. One of these diseases is high blood pressure, which is the origin of many other diseases and thus, increase costs of the health budget is allocated to it. Usually the types of jobs, lack of exercise and poor diet can have a large impact on the disease. Methods :In this study, we try to use data mining algorithms; important relation between disease and high blood pressure are effective features, data on 1000 patients who entered our survey. Results and conclusions: This review was undertaken with association rules employment physical factors and smoking in people with low blood pressure have been seen. Obesity BMI above the low green fruit consumption in people with high blood pressure has been seen together

    Internet Medical Privacy Disclosure Mining and Prediction Model Construction Based on Association Rules

    Get PDF
    In recent years, China\u27s Internet medical industry has developed rapidly and the market scale has been expanding. Medical privacy is an important research point in the Internet medical field. If the patient cannot fully communicate with the doctor on the other end of the Internet, then it is obvious that the patient will not be well treated. Then it becomes very worthwhile to mine the factors affecting patients\u27 privacy disclosure and predict patients\u27 disclosure behavior. This paper uses the classical and improved multidimensional Apriori (MD-Apriori) to mine patient privacy disclosure factors, which proves that the improved MD-Apriori algorithm is more applicable in this study. In order to prove the validity and authority of the mining results, this paper uses SPSS to analyze 331 valid questionnaires. The results show that the privacy disclosure factors obtained by the two methods are almost the same. Finally, based on the above factors, we establish the Internet medical privacy disclosure intention prediction model, in order to guide the construction and improvement of internet medical

    Data mining techniques on satellite images for discovery of risk areas

    Get PDF
    The high rates of cholera epidemic mortality in less developed countries is a challenge for health fa- cilities to which it is necessary to equip itself with the epidemiological surveillance. To strengthen the capacity of epidemiological surveillance, this paper focuses on remote sensing satellite data processing using data mining methods to discover risk areas of the epidemic disease by connecting the environ- ment, climate and health. These satellite data are combined with field data collected during the same set of periods in order to explain and deduct the causes of the epidemic evolution from one period to another in relation to the environment. The existing technical (algorithms) for processing satellite im- ages are mature and efficient, so the challenge today is to provide the most suitable means allowing the best interpretation of obtained results. For that, we focus on supervised classification algorithm to process a set of satellite images from the same area but on different periods. A novel research method- ology (describing pre-treatment, data mining, and post-treatment) is proposed to ensure suitable means for transforming data, generating information and extracting knowledge. This methodology consists of six phases: (1.A) Acquisition of information from the field about epidemic, (1.B) Satellite data acquisition, (2) Selection and transformation of data (Data derived from images), (3) Remote sensing measurements, (4) Discretization of data, (5) Data treatment, and (6) Interpretation of results. The main contributions of the paper are: to establish the nature of links between the environment and the epidemic, and to highlight those risky environments when the public awareness of the problem and the prevention policies are absolutely necessary for mitigation of the propagation and emergence of the epidemic. This will allow national governments, local authorities and the public health officials to effective management according to risk areas. The case study concerns the knowledge discovery in databases related to risk areas of the cholera epidemic in Mopti region, Mali (West Africa). The results generate from data mining association rules indicate that the level of the Niger River in the wintering periods and some societal factors have an impact on the variation of cholera epidemic rate in Mopti town. More the river level is high, at 66% the rate of contamination is high
    corecore