76,152 research outputs found

    Finding an Effective Classification Technique to Develop a Software Team Composition Model

    Get PDF
    Ineffective software team composition has become recognized as a prominent aspect of software project failures. Reports from results extracted from different theoretical personality models have produced contradicting fits, validity challenges, and missing guidance during software development personnel selection. It is also believed that the technique/s used while developing a model can impact the overall results. Thus, this study aims to: 1) discover an effective classification technique to solve the problem, and 2) develop a model for composition of the software development team. The model developed was composed of three predictors: team role, personality types, and gender variables; it also contained one outcome: team performance variable. The techniques used for model development were logistic regression, decision tree, and Rough Sets Theory (RST). Higher prediction accuracy and reduced pattern complexity were the two parameters for selecting the effective technique. Based on the results, the Johnson Algorithm (JA) of RST appeared to be an effective technique for a team composition model. The study has proposed a set of 24 decision rules for finding effective team members. These rules involve gender classification to highlight the appropriate personality profile for software developers. In the end, this study concludes that selecting an appropriate classification technique is one of the most important factors in developing effective models

    New Learning Models for Generating Classification Rules Based on Rough Set Approach

    Get PDF
    Data sets, static or dynamic, are very important and useful for presenting real life features in different aspects of industry, medicine, economy, and others. Recently, different models were used to generate knowledge from vague and uncertain data sets such as induction decision tree, neural network, fuzzy logic, genetic algorithm, rough set theory, and others. All of these models take long time to learn for a huge and dynamic data set. Thus, the challenge is how to develop an efficient model that can decrease the learning time without affecting the quality of the generated classification rules. Huge information systems or data sets usually have some missing values due to unavailable data that affect the quality of the generated classification rules. Missing values lead to the difficulty of extracting useful information from that data set. Another challenge is how to solve the problem of missing data. Rough set theory is a new mathematical tool to deal with vagueness and uncertainty. It is a useful approach for uncovering classificatory knowledge and building a classification rules. So, the application of the theory as part of the learning models was proposed in this thesis. Two different models for learning in data sets were proposed based on two different reduction algorithms. The split-condition-merge-reduct algorithm ( SCMR) was performed on three different modules: partitioning the data set vertically into subsets, applying rough set concepts of reduction to each subset, and merging the reducts of all subsets to form the best reduct. The enhanced-split-condition-merge-reduct algorithm (E SCMR) was performed on the above three modules followed by another module that applies the rough set reduction concept again to the reduct generated by SCMR in order to generate the best reduct, which plays the same role as if all attributes in this subset existed. Classification rules were generated based on the best reduct. For the problem of missing data, a new approach was proposed based on data partitioning and function mode. In this new approach, the data set was partitioned horizontally into different subsets. All objects in each subset of data were described by only one classification value. The mode function was applied to each subset of data that has missing values in order to find the most frequently occurring value in each attribute. Missing values in that attribute were replaced by the mode value. The proposed approach for missing values produced better results compared to other approaches. Also, the proposed models for learning in data sets generated the classification rules faster than other methods. The accuracy of the classification rules by the proposed models was high compared to other models

    Twofold Integer Programming Model for Improving Rough Set Classification Accuracy in Data Mining.

    Get PDF
    The fast growing size of databases has resulted in a great demand for tools capable of analyzing data with the aim of discovering new knowledge and patterns. These tools will hopefully close the gap between the steady growth of information and the escalating demand to understand and discover the value of such knowledge. These tools are known as Data Mining (DM). One aims of DM is to discover decision rules for extracting meaningful knowledge. These rules consist of conditions over attribute value pairs called the descriptions, and decision attributes. Therefore generating a good decision model or classification model is a major component in many data mining researches. The classification approach basically produces a function that maps data item into one of several predefined classes, by way of inputting training dataset and building a model of the class attribute based on the rest of the attributes.This research undertakes three main tasks. The first task is to introduce a new rough model for minimum reduct selection and default rules generation, which is known as a Twofold Integer Programming (TIP). The second task is to enhance rules accuracy based on the first task, while the third task is to classify new objects or cases. The TIP model is based on translation of the discernibility relation of a Decision System (DS) into an Integer Programming (IP) model, resolved by using the branch and bound search method in order to generate the full reduct of the DS. The TIP model is then applied to the reduct to generate the default rules, which in turn are used to classify unseen objects with a satisfying accuracy. Apart from introducing the TIP model, this research also addressed the issues of missing values, discretization and extracting minimum rules. The treatment of missing values and discretization are being carried out during the preprocessing stage. The extraction of minimum rules operation is conducted after the default rules have been generated in order to obtain the most useful discovered rules. Eight datasets from machine learning repositories and domain theories are tested by the TIP model. Total rules number, rules length and rules accuracy for the generation rules are recorded. The accuracy for rules and classification resulted from the TIP method are compared with other methods such as Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) from Rough Set, Genetic Algorithm (GA), Johnson reducer, HoltelR method, Multiple Regression (MR), Neural Network (NN), Induction of Decision Tree Algorithm (ID3) and Base Learning Algorithm (C4.5); all other classifiers that are mostly used in the classification tasks. Based on the experiment results, the classification method using the TIP approach has successfully performed rules generation and classification tasks as required during a classification operation. The outcome of a considerably good accuracy is mainly due to the right selection of relevant attributes. This research has proven that the TIP method has shown the ability to cater for different kinds of datasets and obtained a good rough classification model with promising results as compared with other commonly used classifiers. This research opens a wide range of future work to be considered, which includes applying the proposed method in other areas such as web mining, text mining or multimedia mining; and extending the proposed approach to work in parallel computing in data mining

    Coevolutionary fuzzy attribute order reduction with complete attribute-value space tree

    Get PDF
    Since big data sets are structurally complex, high-dimensional, and their attributes exhibit some redundant and irrelevant information, the selection, evaluation, and combination of those large-scale attributes pose huge challenges to traditional methods. Fuzzy rough sets have emerged as a powerful vehicle to deal with uncertain and fuzzy attributes in big data problems that involve a very large number of variables to be analyzed in a very short time. In order to further overcome the inefficiency of traditional algorithms in the uncertain and fuzzy big data, in this paper we present a new coevolutionary fuzzy attribute order reduction algorithm (CFAOR) based on a complete attribute-value space tree. A complete attribute-value space tree model of decision table is designed in the attribute space to adaptively prune and optimize the attribute order tree. The fuzzy similarity of multimodality attributes can be extracted to satisfy the needs of users with the better convergence speed and classification performance. Then, the decision rule sets generate a series of rule chains to form an efficient cascade attribute order reduction and classification with a rough entropy threshold. Finally, the performance of CFAOR is assessed with a set of benchmark problems that contain complex high dimensional datasets with noise. The experimental results demonstrate that CFAOR can achieve the higher average computational efficiency and classification accuracy, compared with the state-of-the-art methods. Furthermore, CFAOR is applied to extract different tissues surfaces of dynamical changing infant cerebral cortex and it achieves a satisfying consistency with those of medical experts, which shows its potential significance for the disorder prediction of infant cerebrum

    Modified fuzzy rough set technique with stacked autoencoder model for magnetic resonance imaging based breast cancer detection

    Get PDF
    Breast cancer is the common cancer in women, where early detection reduces the mortality rate. The magnetic resonance imaging (MRI) images are efficient in analyzing breast cancer, but it is hard to identify the abnormalities. The manual breast cancer detection in MRI images is inefficient; therefore, a deep learning-based system is implemented in this manuscript. Initially, the visual quality improvement is done using region growing and adaptive histogram equalization (AHE), and then, the breast lesion is segmented by Otsu thresholding with morphological transform. Next, the features are extracted from the segmented lesion, and a modified fuzzy rough set technique is proposed to reduce the dimensions of the extracted features that decreases the system complexity and computational time. The active features are fed to the stacked autoencoder for classifying the benign and malignant classes. The results demonstrated that the proposed model attained 99% and 99.22% of classification accuracy on the benchmark datasets, which are higher related to the comparative classifiers: decision tree, naïve Bayes, random forest and k-nearest neighbor (KNN). The obtained results state that the proposed model superiorly screens and detects the breast lesions that assists clinicians in effective therapeutic intervention and timely treatment

    Hybrid model using logit and nonparametric methods for predicting micro-entity failure

    Get PDF
    Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods (Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic variables complement financial ratios for bankruptcy prediction
    corecore