106,450 research outputs found

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

    Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

    Get PDF
    In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements

    Hybrid Approach for Heart Disease Detection Using Clustering and ANN

    Get PDF
    Data mining is a process of extracting data from data set and transforming it into understandable structure for further use. Data mining techniques have been applied magnificently in many fields including business, science and bio informatics, and on different types of data like textual, visual, spatial, and real-time and sensor data. Heart disease prediction is treated as most difficult task in the field of medical sciences. Heart disease detection using data mining can answer complicated queries for diagnosing heart disease and thus assist healthcare practitioners to make intelligent clinical decisions which traditional decision support systems cannot. By providing effective treatments, it also helps to reduce treatment costs. The aim of this study is to develop an artificial neural networks-based diagnostic model for heart disease using a complex of traditional and genetic factors of this disease

    Pattern Classification using Artificial Neural Networks

    Get PDF
    Classification is a data mining (machine learning) technique used to predict group membership for data instances. Pattern Classification involves building a function that maps the input feature space to an output space of two or more than two classes.Neural Networks (NN) are an effective tool in the field of pattern classification, using training and testing data to build a model. However, the success of the networks is highly dependent on the performance of the training process and hence the training algorithm. Many training algorithms have been proposed so far to improve the performance of neural networks. In this project, we shall make a comparative study of training feedforward neural network using the three algorithms - Backpropagation Algorithm, Modified Backpropagation Algorithm and Optical Backpropagation Algorithm. These algorithms differ only on the basis of their error functions.We shall train the neural networks using these algorithms and taking 75 instances from the iris dataset (taken from the UCI repository and then normalised) ; 25 from each class. The total number of epochs required to reach the degree of accuracy is referred to as the convergence rate. The basic criteria of comparison process are the convergence rate and the classification accuracy. To check the efficiency of the three training algorithms, graphs are plotted between No. of Epochs vs. Mean Square Error(MSE). The training process continues till M.S.E falls to a value 0.01. The effect of using the momentum and learning rate on the performance of algorithm are also observed. The comparison is then extended to compare the performance of multilayer feedforward network with Probabilistic network
    corecore