76,152 research outputs found
Finding an Effective Classification Technique to Develop a Software Team Composition Model
Ineffective software team composition has become recognized as a prominent aspect of software project failures. Reports from results extracted from different theoretical personality models have produced contradicting fits, validity challenges, and missing guidance during software development personnel selection. It is also believed that the technique/s used while developing a model can impact the overall results. Thus, this study aims to: 1) discover an effective classification technique to solve the problem, and 2) develop a model for composition of the software development team. The model developed was composed of three predictors: team role, personality types, and gender variables; it also contained one outcome: team performance variable. The techniques used for model development were logistic regression, decision tree, and Rough Sets Theory (RST). Higher prediction accuracy and reduced pattern complexity were the two parameters for selecting the effective technique. Based on the results, the Johnson Algorithm (JA) of RST appeared to be an effective technique for a team composition model. The study has proposed a set of 24 decision rules for finding effective team members. These rules involve gender classification to highlight the appropriate personality profile for software developers. In the end, this study concludes that selecting an appropriate classification technique is one of the most important factors in developing effective models
New Learning Models for Generating Classification Rules Based on Rough Set Approach
Data sets, static or dynamic, are very important and useful for presenting real life
features in different aspects of industry, medicine, economy, and others. Recently,
different models were used to generate knowledge from vague and uncertain data
sets such as induction decision tree, neural network, fuzzy logic, genetic algorithm,
rough set theory, and others. All of these models take long time to learn for a huge
and dynamic data set. Thus, the challenge is how to develop an efficient model that
can decrease the learning time without affecting the quality of the generated
classification rules. Huge information systems or data sets usually have some
missing values due to unavailable data that affect the quality of the generated
classification rules. Missing values lead to the difficulty of extracting useful
information from that data set. Another challenge is how to solve the problem of
missing data. Rough set theory is a new mathematical tool to deal with vagueness and uncertainty.
It is a useful approach for uncovering classificatory knowledge and building a
classification rules. So, the application of the theory as part of the learning models
was proposed in this thesis.
Two different models for learning in data sets were proposed based on two different
reduction algorithms. The split-condition-merge-reduct algorithm ( SCMR) was
performed on three different modules: partitioning the data set vertically into subsets,
applying rough set concepts of reduction to each subset, and merging the reducts of
all subsets to form the best reduct. The enhanced-split-condition-merge-reduct
algorithm (E SCMR) was performed on the above three modules followed by another
module that applies the rough set reduction concept again to the reduct generated by
SCMR in order to generate the best reduct, which plays the same role as if all
attributes in this subset existed. Classification rules were generated based on the best
reduct.
For the problem of missing data, a new approach was proposed based on data
partitioning and function mode. In this new approach, the data set was partitioned
horizontally into different subsets. All objects in each subset of data were described
by only one classification value. The mode function was applied to each subset of
data that has missing values in order to find the most frequently occurring value in
each attribute. Missing values in that attribute were replaced by the mode value.
The proposed approach for missing values produced better results compared to other
approaches. Also, the proposed models for learning in data sets generated the classification rules faster than other methods. The accuracy of the classification rules
by the proposed models was high compared to other models
Twofold Integer Programming Model for Improving Rough Set Classification Accuracy in Data Mining.
The fast growing size of databases has resulted in a great demand for tools capable of analyzing data with the aim of discovering new knowledge and patterns. These tools will hopefully close the gap between the steady growth of information and the escalating demand to understand and discover the value of such knowledge. These tools are known as Data Mining (DM). One aims of DM is to discover decision rules for extracting meaningful knowledge. These rules consist of conditions over attribute value pairs called the descriptions, and decision attributes. Therefore generating a good decision model or classification model is a major component in many data mining researches. The classification approach basically produces a function that maps data item into one of several predefined classes, by way of inputting training dataset and building a model of the class attribute based on the rest of the attributes.This research undertakes three main tasks. The first task is to introduce a new rough model for minimum reduct selection and default rules generation, which is known as a Twofold Integer Programming (TIP). The second task is to enhance rules accuracy based on the first task, while the third task is to classify new objects or
cases. The TIP model is based on translation of the discernibility relation of a Decision System (DS) into an Integer Programming (IP) model, resolved by using the branch
and bound search method in order to generate the full reduct of the DS. The TIP model is then applied to the reduct to generate the default rules, which in turn are
used to classify unseen objects with a satisfying accuracy.
Apart from introducing the TIP model, this research also addressed the issues of missing values, discretization and extracting minimum rules. The treatment of
missing values and discretization are being carried out during the preprocessing stage. The extraction of minimum rules operation is conducted after the default
rules have been generated in order to obtain the most useful discovered rules. Eight datasets from machine learning repositories and domain theories are tested by
the TIP model. Total rules number, rules length and rules accuracy for the generation rules are recorded. The accuracy for rules and classification resulted
from the TIP method are compared with other methods such as Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) from Rough Set, Genetic Algorithm (GA), Johnson reducer, HoltelR method, Multiple
Regression (MR), Neural Network (NN), Induction of Decision Tree Algorithm (ID3) and Base Learning Algorithm (C4.5); all other classifiers that are mostly used
in the classification tasks.
Based on the experiment results, the classification method using the TIP approach has successfully performed rules generation and classification tasks as required
during a classification operation. The outcome of a considerably good accuracy is mainly due to the right selection of relevant attributes. This research has proven that the TIP method has shown the ability to cater for different kinds of datasets and obtained a good rough classification model with promising results as compared with
other commonly used classifiers. This research opens a wide range of future work to be considered, which includes
applying the proposed method in other areas such as web mining, text mining or multimedia mining; and extending the proposed approach to work in parallel
computing in data mining
Coevolutionary fuzzy attribute order reduction with complete attribute-value space tree
Since big data sets are structurally complex, high-dimensional, and their attributes exhibit some redundant and irrelevant information, the selection, evaluation, and combination of those large-scale attributes pose huge challenges to traditional methods. Fuzzy rough sets have emerged as a powerful vehicle to deal with uncertain and fuzzy attributes in big data problems that involve a very large number of variables to be analyzed in a very short time. In order to further overcome the inefficiency of traditional algorithms in the uncertain and fuzzy big data, in this paper we present a new coevolutionary fuzzy attribute order reduction algorithm (CFAOR) based on a complete attribute-value space tree. A complete attribute-value space tree model of decision table is designed in the attribute space to adaptively prune and optimize the attribute order tree. The fuzzy similarity of multimodality attributes can be extracted to satisfy the needs of users with the better convergence speed and classification performance. Then, the decision rule sets generate a series of rule chains to form an efficient cascade attribute order reduction and classification with a rough entropy threshold. Finally, the performance of CFAOR is assessed with a set of benchmark problems that contain complex high dimensional datasets with noise. The experimental results demonstrate that CFAOR can achieve the higher average computational efficiency and classification accuracy, compared with the state-of-the-art methods. Furthermore, CFAOR is applied to extract different tissues surfaces of dynamical changing infant cerebral cortex and it achieves a satisfying consistency with those of medical experts, which shows its potential significance for the disorder prediction of infant cerebrum
Modified fuzzy rough set technique with stacked autoencoder model for magnetic resonance imaging based breast cancer detection
Breast cancer is the common cancer in women, where early detection reduces the mortality rate. The magnetic resonance imaging (MRI) images are efficient in analyzing breast cancer, but it is hard to identify the abnormalities. The manual breast cancer detection in MRI images is inefficient; therefore, a deep learning-based system is implemented in this manuscript. Initially, the visual quality improvement is done using region growing and adaptive histogram equalization (AHE), and then, the breast lesion is segmented by Otsu thresholding with morphological transform. Next, the features are extracted from the segmented lesion, and a modified fuzzy rough set technique is proposed to reduce the dimensions of the extracted features that decreases the system complexity and computational time. The active features are fed to the stacked autoencoder for classifying the benign and malignant classes. The results demonstrated that the proposed model attained 99% and 99.22% of classification accuracy on the benchmark datasets, which are higher related to the comparative classifiers: decision tree, naïve Bayes, random forest and k-nearest neighbor (KNN). The obtained results state that the proposed model superiorly screens and detects the breast lesions that assists clinicians in effective therapeutic intervention and timely treatment
Hybrid model using logit and nonparametric methods for predicting micro-entity failure
Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper
by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to
detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods
(Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as
either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and
Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method
implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic
variables complement financial ratios for bankruptcy prediction
- …