54 research outputs found

    Comparing the Performance of Predictive Models Constructed Using the Techniques of Feed-Forword and Generalized Regression Neural Networks

    Get PDF
    Artificial Neural Network (ANNs) is an efficient machine learning method that can be used to fits model from data for prediction purposes. It is capable of modelling the class prediction as a nonlinear combination of the inputs. However, a number of factors may affect the accuracy of the model created using this approach. The choice of network type and how the network is optimally configured plays important role in the performance of a predictive model created using neural network techniques. This paper compares the accuracy of two typical neural network techniques used for creating a predictive model. The techniques are feed-forward neural network and the generalized regression networks. The model created using both techniques are evaluated for correctness. The resulting outputs show that, the Generalized Regression Neural Network (GRNN) consistently produces a more accurate result. Findings further show that, the fitting of the network predictive model using the technique of Feed-forward Neural Network (FNN) records error value of 1.086 higher than the generalized regression network

    Neural networks optimization through genetic algorithm searches: A review

    Get PDF
    Neural networks and genetic algorithms are the two sophisticated machine learning techniques presently attracting attention from scientists, engineers, and statisticians, among others. They have gained popularity in recent years. This paper presents a state of the art review of the research conducted on the optimization of neural networks through genetic algorithm searches. Optimization is aimed toward deviating from the limitations attributed to neural networks in order to solve complex and challenging problems. We provide an analysis and synthesis of the research published in this area according to the application domain, neural network design issues using genetic algorithms, types of neural networks and optimal values of genetic algorithm operators (population size, crossover rate and mutation rate). This study may provide a proper guide for novice as well as expert researchers in the design of evolutionary neural networks helping them choose suitable values of genetic algorithm operators for applications in a specific problem domain. Further research direction, which has not received much attention from scholars, is unveiled

    Non-contrast computed tomography-based radiomics for staging of connective tissue disease-associated interstitial lung disease

    Get PDF
    Rationale and introductionIt is of significance to assess the severity and predict the mortality of patients with connective tissue disease-associated interstitial lung disease (CTD-ILD). In this double-center retrospective study, we developed and validated a radiomics nomogram for clinical management by using the ILD-GAP (gender, age, and pulmonary physiology) index system.Materials and methodsPatients with CTD-ILD were staged using the ILD-GAP index system. A clinical factor model was built by demographics and CT features, and a radiomics signature was developed using radiomics features extracted from CT images. Combined with the radiomics signature and independent clinical factors, a radiomics nomogram was constructed and evaluated by the area under the curve (AUC) from receiver operating characteristic (ROC) analyses. The models were externally validated in dataset 2 to evaluate the model generalization ability using ROC analysis.ResultsA total of 245 patients from two clinical centers (dataset 1, n = 202; dataset 2, n = 43) were screened. Pack-years of smoking, traction bronchiectasis, and nine radiomics features were used to build the radiomics nomogram, which showed favorable calibration and discrimination in the training cohort {AUC, 0.887 [95% confidence interval (CI): 0.827–0.940]}, the internal validation cohort [AUC, 0.885 (95% CI: 0.816–0.922)], and the external validation cohort [AUC, 0.85 (95% CI: 0.720–0.919)]. Decision curve analysis demonstrated that the nomogram outperformed the clinical factor model and radiomics signature in terms of clinical usefulness.ConclusionThe CT-based radiomics nomogram showed favorable efficacy in predicting individual ILD-GAP stages

    The new efficient and accurate attribute-oriented clustering algorithms for categorical data

    Get PDF
    Categorical data clustering has attracted much attention recently due to the fact that much of the data contained in today’s databases is categorical in nature. Many algorithms for clustering categorical data have been proposed, in which attribute-oriented hierarchical divisive clustering algorithm Min-Min Roughness (MMR) has the highest efficiency among these algorithms with low clustering accuracy, conversely, genetic clustering algorithm Genetic-Average Normalized Mutual Information (G-ANMI) has the highest clustering accuracy among these algorithms with low clustering efficiency. This work firstly reveals the significance of attributes in categorical data clustering, and then investigates the limitations of algorithms MMR and G-ANMI respectively, and correspondingly proposes a new attribute-oriented hierarchical divisive clustering algorithm termed Mean Gain Ratio (MGR) and an improved genetic clustering algorithm termed Improved G-ANMI (IG-ANMI) for categorical data. MGR includes two steps: selecting clustering attribute and selecting equivalence class on the clustering attribute. Information theory based concepts of mean gain ratio and entropy of clusters are used to implement these two steps, respectively. MGR can be run with or without specifying the number of clusters while few existing clustering algorithms for categorical data can be run without specifying the number of clusters. IG-ANMI algorithm improves G-ANMI by developing a new attribute-oriented initialization method in which part of initial chromosomes is generated by using the attributes partitions. Four real-life data sets obtained from University of California Irvine (UCI) machine learning repository and ten synthetically generated data sets are used to evaluate MGR and IG-ANMI algorithms, and other four algorithms are used to compare with these two algorithms. The experimental results show that MGR overcomes the limitations of MMR and the average clustering accuracy is improved by 19% (from 0.696 to 0.83), at the same time maintains the highest efficiency. IG-ANMI greatly improves the efficiency of G-ANMI (improved by 31% on the Zoo data set, 74% on the Votes data set, 59% on the Breast Cancer data set, and 3428% on the Mushroom data set) as well as the clustering accuracy of G-ANMI (the average clustering accuracy on four UCI data sets is improved by 10.6%, from 0.815 to 0.901), at the same time maintains the highest clustering accuracy. IG-ANMI has obvious advantage against G-ANMI on large data sets in terms of clustering efficiency as well as clustering accuracy. In addition, both of MGR and IG-ANMI have good scalability. The running time of MGR and IG-ANMI algorithms tend to vary linearly with the increase of the number of objects as well as the number of clusters

    Datasets Size: Effect on Clustering Results

    Get PDF
    The recent advancement in the way we capture and store data pose a serious challenge for data analysis. This gives a wider acceptance to data mining, being an interdisciplinary field that implements algorithm on stored data with a view to discovering hidden knowledge. Most people that keep records, however, are yet to reap the benefits of this tool, this is due to the general notion that a large datasets is required to guarantee reliable results. However, this may not be applicable in all cases. In this paper, we proposed a research technique that implements descriptive algorithms on numeric datasets of varied sizes. We modeled each subset of our data using EM clustering algorithm; two different numbers of partitions (k) were estimated and used for each experiment. The clustering results were validated using external evaluation measure in order to determine their level of correctness. The approach unveils the implication of datasets size on the clusters formed and the impact of estimated number of partitions

    Risk Status Prediction and Modelling Of Students’ Academic Achievement - A Fuzzy Logic Approach

    Get PDF
    Several students usually fall victims of low grade point at the end of their first year in the institution of higher learning and some were even withdrawn due to their unacceptable grade point average (GPA); this could be prevented if necessary measures were taken at the appropriate time. In this paper, a modelusing fuzzy logic approach to predict the risk status of students based on some predictive factors is proposed. Some basic information that has some correlations with students’ academic achievement and other predictive variables were modelled, the simulated model shows some degree of risk associated with their past academic achievement. The result of this study would enable the teacher to pay more attention to student’s weaknessesand could also help school management in decision making, especially for the purpose of giving scholarship to talented students whose risk of failure was found to be very low; while students identified as having high risk of failure, could be counselled and motivated with a view to improving their learning abilit

    An Interval-Valued Fuzzy Soft Set Based Decision Making Approach Considering Weight

    No full text
    The algorithm to solve fuzzy decision making problems based on interval-valued fuzzy soft sets has been proposed. However, this algorithm can not consider the weight of parameter which is very important for decision maker. In this paper, we propose an interval-valued fuzzy soft set based decision making algorithm considering weight. Finally, an illustrative example is employed to show our contribution

    Application of a New Efficient Normal Parameter Reduction Algorithm of Soft Sets in Online Shopping

    No full text
    A new efficient normal parameter reduction algorithm of soft set in decision making was proposed. However, up to the present, few documents have focused on real-life applications of this algorithm. Accordingly, we apply a New Efficient Normal Parameter Reduction algorithm into real-life datasets of online shopping, such as Blackberry Mobile Phone Dataset. Experimental results show that this algorithm is not only suitable but feasible for dealing with the online shopping

    A Novel Soft Set Approach in Selecting Clustering Attribute

    Get PDF
    Clustering is one of the most useful tasks in data mining process for discovering groups and identifying interesting distributions and patterns in the underlying data. One of the techniques of data clustering was performed by introducing a clustering attribute. Soft set theory, initiated by Molodtsov in 1999, is a new general mathematical tool for dealing with uncertainties. In this paper, we define a soft set model on the equivalence classes of an information system, which can be easily applied in obtaining approximate sets of rough sets. Furthermore, we use it to select a clustering attribute for categorical datasets and a heuristic algorithm is presented. Experiment results on fifteen UCI benchmark datasets showed that the proposed approach provides a faster decision in selecting a clustering attribute as compared with maximum dependency attributes (MDAs) approach up to 14.84%. Furthermore, MDA and NSS have a good scalability i.e. the executing time of both algorithms tends to increase linearly as the number of instances and attributes are increased, respectively

    MGR: An Information Theory Based Hierarchical Divisive Clustering Algorithm for Categorical Data

    Get PDF
    Categorical data clustering has attracted much attention recently due to the fact that much of the data contained in today’s databases is categorical in nature. While many algorithms for clustering categorical data have been proposed, some have low clustering accuracy while others have high computational complexity. This research proposes mean gain ratio (MGR), a new information theory based hierarchical divisive clustering algorithm for categorical data. MGR implements clustering from the attributes viewpoint which includes selecting a clustering attribute using mean gain ratio and selecting an equivalence class on the clustering attribute using entropy of clusters. It can be run with or without specifying the number of clusters while few existing clustering algorithms for categorical data can be run without specifying the number of clusters. Experimental results on nine University of California at Irvine (UCI) benchmark and ten synthetic data sets show that MGR performs better as compared to baseline algorithms in terms of its performance and efficiency of clustering
    • …
    corecore