2,258 research outputs found

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Integrating Economic Knowledge in Data Mining Algorithms

    Get PDF
    The assessment of knowledge derived from databases depends on many factors. Decision makers often need to convince others about the correctness and effectiveness of knowledge induced from data.The current data mining techniques do not contribute much to this process of persuasion.Part of this limitation can be removed by integrating knowledge from experts in the field, encoded in some accessible way, with knowledge derived form patterns stored in the database.In this paper we will in particular discuss methods for implementing monotonicity constraints in economic decision problems.This prior knowledge is combined with data mining algorithms based on decision trees and neural networks.The method is illustrated in a hedonic price model.knowledge;neural network;data mining;decision trees

    Adaptive Data Mining Approach for Pcb Defect Detection and Classification

    Get PDF
    Objective: To develop a model for PCB defect detection and classification with the help of soft computing technique. Methodology: To improve the performance of the prediction and classification we propose a hybrid approach for feature reduction and classification. The proposed approach is divided into three main stages: (i) data pre-processing (ii) feature selection and reduction and (iii) Classification. In this approach, pre-processing, feature selection and reduction is carried out by measuring of confidence with the adaptive genetic algorithm. Prediction and classification is carried out by using neural network classifier. A genetic algorithm is used for data preprocessing to achieve the feature reduction and confidence measurement. Findings: The system is implemented using MatLab 2013b. The resulting analysis shows that the proposed approach is capable of detecting and classifying defects in PCB board

    Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

    Get PDF
    The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research

    A new unsupervised feature selection method for text clustering based on genetic algorithms

    Get PDF
    Nowadays a vast amount of textual information is collected and stored in various databases around the world, including the Internet as the largest database of all. This rapidly increasing growth of published text means that even the most avid reader cannot hope to keep up with all the reading in a field and consequently the nuggets of insight or new knowledge are at risk of languishing undiscovered in the literature. Text mining offers a solution to this problem by replacing or supplementing the human reader with automatic systems undeterred by the text explosion. It involves analyzing a large collection of documents to discover previously unknown information. Text clustering is one of the most important areas in text mining, which includes text preprocessing, dimension reduction by selecting some terms (features) and finally clustering using selected terms. Feature selection appears to be the most important step in the process. Conventional unsupervised feature selection methods define a measure of the discriminating power of terms to select proper terms from corpus. However up to now the valuation of terms in groups has not been investigated in reported works. In this paper a new and robust unsupervised feature selection approach is proposed that evaluates terms in groups. In addition a new Modified Term Variance measuring method is proposed for evaluating groups of terms. Furthermore a genetic based algorithm is designed and implemented for finding the most valuable groups of terms based on the new measure. These terms then will be utilized to generate the final feature vector for the clustering process . In order to evaluate and justify our approach the proposed method and also a conventional term variance method are implemented and tested using corpus collection Reuters-21578. For a more accurate comparison, methods have been tested on three corpuses and for each corpus clustering task has been done ten times and results are averaged. Results of comparing these two methods are very promising and show that our method produces better average accuracy and F1-measure than the conventional term variance method

    ENHANCEMENT OF CHURN PREDICTION ALGORITHMS

    Get PDF
    Customer churn can be described as the process by which consumers of goods and services discontinue the consumption of a product or service and switch over to a competitor.It is of great concern to many companies. Thus, decision support systems are needed to overcome this pressing issue and ensure good return on investments for organizations. Decision support systems use analytical models to provide the needed intelligence to analyze an integrated customer record database to predict customers that will churn and offer recommendations that will prevent them from churning. Customers churn prediction, unlike most conventional business intelligence techniques, deals with customer demographics, net worth-value, and market opportunities. It is used in determining customers who are likely to churn, those likely to remain loyal to the organization, and for prediction of future churn rates. Customer defection is naturally a slow rate event, and it is not easily detected by most business intelligent solutions available in the market; especially when data is skewed, large, and distinct. Thus, accurate and precise prediction methods are needed to detect the churning trend. In this study, a churn model that applies business intelligence techniques to detect the possibility that a customer will churn using churn trend analysis of customer records is proposed. The model applies clustering algorithms and enhanced SPRINT decision tree algorithms to explore customer record database, and identify the customer profile and behavior patterns. The Model then predicts the possibility that a customer will churn. Additionally, it offers solutions for retaining customers and making them loyal to a business entity by recommending customer-relationship management measures

    Data mining for the diagnosis of type 2 diabetes

    Get PDF
    Diabetes is the most common disease nowadays in all populations and in all age groups. diabetes contributing to heart disease, increases the risks of developing kidney disease, blindness, nerve damage, and blood vessel damage. Diabetes disease diagnosis via proper interpretation of the diabetes data is an important classification problem. Different techniques of artificial intelligence has been applied to diabetes problem. The purpose of this study is apply the artificial metaplasticity on multilayer perceptron (AMMLP) as a data mining (DM) technique for the diabetes disease diagnosis. The Pima Indians diabetes was used to test the proposed model AMMLP. The results obtained by AMMLP were compared with decision tree (DT), Bayesian classifier (BC) and other algorithms, recently proposed by other researchers, that were applied to the same database. The robustness of the algorithms are examined using classification accuracy, analysis of sensitivity and specificity, confusion matrix. The results obtained by AMMLP are superior to obtained by DT and BC
    • ā€¦
    corecore