134,986 research outputs found
Identifying Road Accidents Severity Problems Using Data Mining Approaches
Roadway traffic safety is a major concern for transportation governing agencies as well as ordinarycitizens. In order to give safe driving suggestions, carefulanalysis of roadway traffic data is critical to find outvariables that are closely related to fatal accidents. Inthis paper we apply statistics analysis and data miningalgorithms on the FARS Fatal Accident dataset as an attempt to address this problem. The relationship betweenfatal rate and other attributes including collision manner,weather, surface condition, light condition, and drunkdriver were investigated. Association rules were discoveredby Apriori algorithm, classification model was built byNaive Bayes classifier, and clusters were formed by simple K-means clustering algorithm. Here we are also using one more classification technique for comparing with Naïve bayes classifier. Certain safety driving suggestions were made based on statistics, association rules, classification model, and clusters obtained
Application of data mining techniques in bioinformatics
With the widespread use of databases and the explosive growth in their sizes, there is a need to effectively utilize these massive volumes of data. This is where data mining comes in handy, as it scours the databases for extracting hidden patterns, finding hidden information, decision making and hypothesis testing. Bioinformatics, an upcoming field in today’s world, which involves use of large databases can be effectively searched through data mining techniques to derive useful rules. Based on the type of knowledge that is mined, data mining techniques [1] can be mainly classified into association rules, decision trees and clustering. Until recently, biology lacked the tools to analyze massive repositories of information such as the human genome database [3]. The data mining techniques are effectively used to extract meaningful relationships from these data.Data mining is especially used in microarray analysis which is used to study the activity of different cells under different conditions. Two algorithms under each mining techniques were implemented for a large database and
compared with each other.
1. Association Rule Mining: - (a) a priori (b) partition
2. Clustering: - (a) k-means (b) k-medoids
3. Classification Rule Mining:- Decision tree generation using (a) gini index (b) entropy value. Genetic algorithms were applied to association and classification techniques. Further, kmeans and Density Based Spatial Clustering of Applications of Noise (DBSCAN) clustering techniques [1] were applied to a microarray dataset and compared. The microarray dataset was downloaded from internet using the Gene Array Analyzer Software(GAAS).The clustering was done on the basis of the signal color intensity of the genes in the microarray experiment. The following results were obtained:-
1. Association:- For smaller databases, the a priori algorithm works better than partition algorithm and for larger databases partition works better.
2. Clustering:- With respect to the number of interchanges, k-medoids algorithm works better than k-means algorithm.
3. Classification:- The results were similar for both the indices (gini index and entropy value). The application of genetic algorithm improved the efficiency of the association and classification techniques. For the microarray dataset, it was found that DBSCAN is less efficient than k-means when the database is small but for larger database DBSCAN is more accurate and efficient in terms of no. of clusters and time of execution. DBSCAN execution time increases linearly with the increase in database and was much lesser than that of k-means for larger database. Owing to the involvement of large datasets and the need to derive results from them, data mining techniques can be effectively put in use in the field of Bio-informatics [2]. The techniques can be applied to find associations among the genes, cluster similar gene and protein sequences and draw decision trees to classify the genes. Further, the data mining techniques can be made more efficient by applying genetic algorithms which greatly improves the search procedure and reduces the execution time
Recommended from our members
Enhancing Fuzzy Associative Rule Mining Approaches for Improving Prediction Accuracy. Integration of Fuzzy Clustering, Apriori and Multiple Support Approaches to Develop an Associative Classification Rule Base
Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests.
The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets.
Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction.
The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the
ii
improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models.
A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system.Applied Science University (ASU) of Jorda
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
Survey of data mining approaches to user modeling for adaptive hypermedia
The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio
- …