15 research outputs found

    Knowledge-based Systems and Interestingness Measures: Analysis with Clinical Datasets

    Get PDF
    Knowledge mined from clinical data can be used for medical diagnosis and prognosis. By improving the quality of knowledge base, the efficiency of prediction of a knowledge-based system can be enhanced. Designing accurate and precise clinical decision support systems, which use the mined knowledge, is still a broad area of research. This work analyses the variation in classification accuracy for such knowledge-based systems using different rule lists. The purpose of this work is not to improve the prediction accuracy of a decision support system, but analyze the factors that influence the efficiency and design of the knowledge base in a rule-based decision support system. Three benchmark medical datasets are used. Rules are extracted using a supervised machine learning algorithm (PART). Each rule in the ruleset is validated using nine frequently used rule interestingness measures. After calculating the measure values, the rule lists are used for performance evaluation. Experimental results show variation in classification accuracy for different rule lists. Confidence and Laplace measures yield relatively superior accuracy: 81.188% for heart disease dataset and 78.255% for diabetes dataset. The accuracy of the knowledge-based prediction system is predominantly dependent on the organization of the ruleset. Rule length needs to be considered when deciding the rule ordering. Subset of a rule, or combination of rule elements, may form new rules and sometimes be a member of the rule list. Redundant rules should be eliminated. Prior knowledge about the domain will enable knowledge engineers to design a better knowledge base

    Test Suite Reduction Using HGS Based Heuristic Approach

    Get PDF
    Regression testing is performed throughout the software lifecycle to uncover the faults as early as possible and to ensure that changes do not have any adverse effect in the software that is operational. Test suites once developed are reused and updated frequently. As the software evolves, test cases in the test suite may become redundant. The reason behind this is that the requirements covered by newly added test cases may also be covered by the existing test cases. This redundant nature of test suite increases the cost of executing the same. Further, resource and time constraints impose the necessity to develop techniques to minimize test suites by removing redundant test cases. Few heuristic approaches have been used to solve the test suite minimization problem. Even though solutions exist, still the redundancy of test case remains. In order to solve this problem, this paper proposes two Harrold-Gupta-Soffa (HGS) based heuristic algorithms namely, Non Redundant HGS and Enhanced HGS. The former utilizes the redundant strategy available with Greedy, Redundant, Essential (GRE) to get rid of redundancy, whereas the latter selects a test case for higher cardinalities based on overall coverage of unmarked associated testing sets and thus arrives at reduced, non-redundant test suite. The experiments show that the proposed algorithms always select smaller size of test suite, compared to the existing HGS heuristics

    Hybrid approach using fuzzy sets and extreme learning machine for classifying clinical datasets

    No full text
    Data mining techniques play a major role in developing computer aided diagnosis systems and expert systems that will aid a physician in clinical decision making. In this work, a classifier that combines the relative merits of fuzzy sets and extreme learning machine (FELM) for clinical datasets is proposed. The three major subsystems in the FELM framework are preprocessing subsystem, fuzzification subsystem and classification subsystem. Missing value imputation and outlier elimination are handled by the preprocessing subsystem. The fuzzification subsystem maps each feature to a fuzzy set and the classification subsystem uses extreme learning machine for classification.Cleveland heart disease (CHD), Statlog heart disease (SHD) and Pima Indian diabetes (PID) datasets from the University of California Irvine (UCI) machine learning repository have been used for experimentation. The CHD and SHD datasets have been experimented with two class labels one indicating the absence and the other indicating the presence of heart disease. The CHD dataset has also been experimented with five class labels, one class label indicating the absence of heart disease and the other four class labels indicating the severity of heart disease namely low risk, medium risk, high risk and serious. The PID data set has been experimented with two class labels one indicating the absence and the other indicating the presence of gestational diabetes.The classifier has achieved an accuracy of 93.55% for CHD data set with two class labels; 73.77% for CHD data set with five class labels; 94.44% for SHD data set and 92.54% for PID dataset. Keywords: Extreme learning machine, Fuzzification, Fuzzy set, Classification, Euclidean distance, Membership functio

    Computer aided diagnosis of pulmonary hamartoma from CT scan images using ant colony optimization based feature selection

    No full text
    Background: Computer-aided diagnosis (CAD) systems for the detection of lung disorders play an important role in clinical decision making. CAD systems provide a second opinion to the physician in interpreting computed tomography (CT) images. In this work, a CAD system to diagnose pulmonary hamartoma nodules from chest CT images is proposed. Methods: Segmentation of lung parenchyma from CT images is carried out using Otsuā€™s thresholding method. Nodules are considered to be the region of interests (ROIs) in this work. Texture, shape and run length based features are extracted from the ROIs. Cosine similarity measure (CSM) and rough dependency measure (RDM) are used independently as filter evaluation functions with ant colony optimization (ACO) to select two subsets of features. The selected subsets are used to train two classifiers namely support vector machine (SVM) and Naive Bayes (NB) classifiers using 10-fold cross validation. All the four trained classifiers are tested and the performance measures are estimated. Results: CT slices of patients affected with pulmonary cancer and hamartoma are used for experimentation. From the lung parenchymal tissues of 300 CT slices, 390 nodules are extracted. The feature selection algorithms, ACO-CSM and ACO-RDM are run for different feature subset sizes. The selected features are used to train SVM and NB classifiers. From the results obtained, it is inferred that SVM classifier with the feature subsets chosen by ACO-RDM feature selection approach yielded a maximum classification accuracy of 94.36% with 38 features. Conclusion: From the results, it can be clearly inferred that selecting relevant features to train the classifier has a definite impact on the performance of the classifier. Keywords: Computer aided diagnosis, Pulmonary hamartoma, Ant colony optimization, Cosine similarity, Rough dependency, Support vector machin

    Knowledge-based Systems and Interestingness Measures: Analysis with Clinical datasets

    No full text

    Model level code smell detection using EGAPSO based on similarity measures

    No full text
    Software maintenance is an essential part of any software that finds its use in the day-to-day activities of any organization. During the maintenance phase bugs detected must be corrected and the software must evolve with respect to changing requirements without ripple effects. Software maintenance is a cumbersome process if code smells exist in the software. The impact of poor design is code smells. In code smells detection, majority of the existing approaches are rule based, where a rule represents the combination of metrics and threshold. In rule based approach, defining the rules that detect the code smells are time consuming because identifying the correct threshold value is a tedious task, which can be fixed only through trial and error method. To address this issue, in this work Euclidean distance based Genetic Algorithm and Particle Swarm Optimization (EGAPSO) is used. Instead of relying on threshold value, this approach detects all code smells based on similarity between the system under study and the set of defect examples, where the former is the initial model and the latter is the base example. The approach is tested on the open source projects, namely Gantt Project and Log4j for identifying the five code smells namely Blob, Functional Decomposition, Spaghetti Code, Data Class and Feature Envy. Finally, the approach is compared with code smell detection using Genetic Algorithm (GA), DEtection and CORrection (DECOR), Parallel Evolutionary Algorithm (PEA) and Multi-Objective Genetic Programming (MOGP). The result of EGAPSO proves to be effective when compared to other code smell detection approaches. Keywords: Software maintenance, Code smell, Software metrics, Search based software engineering, Euclidean distance, Open source softwar

    Identification and Classification of Pulmonary Nodule in Lung Modality Using Digital Computer

    No full text
    This paper proposes an intelligent approach for the development of a new support system to improve the performance of Computer Aided Diagnosis for automated pulmonary nodule identification on Computed Tomography images which is Digital Imaging and Communications in Medicine format. The first step in diagnosis of any abnormality in lung region, is to acquire a Computer Tomography image, a non-invasive procedure. The digital format of the image is highly portable, hence the extraction and sharing of valuable knowledge. The large number of related images pose a challenge in coherence and consequently arriving at conclusion. The CAD system has been designed and developed to segment the lung tumour region and extract the features which is of region of interest. The Detection process consists of two steps, namely Lung segmentation and Feature extraction. In segmentation of lung region K-means, Watershed and Histogram based algorithms is implemented. The extracted features and the label of the corresponding ROI were used to train a neural network . Finally , these properties are used to classify lung tumour as benign or malignant. The main objective of this method is to reduce false positive rate and to improve the access time and reduce inter-observer variability
    corecore