1,057 research outputs found

    Are screening methods useful in feature selection? An empirical study

    Full text link
    Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful, or a learning algorithm could do a better job without them. For this purpose, many popular screening methods are partnered in this paper with three regression learners and five classification learners and evaluated on ten real datasets to obtain accuracy criteria such as R-square and area under the ROC curve (AUC). The obtained results are compared through curve plots and comparison tables in order to find out whether screening methods help improve the performance of learning algorithms and how they fare with each other. Our findings revealed that the screening methods were useful in improving the prediction of the best learner on two regression and two classification datasets out of the ten datasets evaluated.Comment: 29 pages, 4 figures, 21 table

    A Feature Ranking Algorithm in Pragmatic Quality Factor Model for Software Quality Assessment

    Get PDF
    Software quality is an important research area and has gain considerable attention from software engineering community in identification of priority quality attributes in software development process. This thesis describes original research in the field of software quality model by presenting a Feature Ranking Algorithm (FRA) for Pragmatic Quality Factor (PQF) model. The proposed algorithm is able to improve the weaknesses in PQF model in updating and learning the important attributes for software quality assessment. The existing assessment techniques lack of the capability to rank the quality attributes and data learning which can enhance the quality assessment process. The aim of the study is to identify and propose the application of Artificial Intelligence (AI) technique for improving quality assessment technique in PQF model. Therefore, FRA using FRT was constructed and the performance of the FRA was evaluated. The methodology used consists of theoretical study, design of formal framework on intelligent software quality, identification of Feature Ranking Technique (FRT), construction and evaluation of FRA algorithm. The assessment of quality attributes has been improved using FRA algorithm enriched with a formula to calculate the priority of attributes and followed by learning adaptation through Java Library for Multi Label Learning (MULAN) application. The result shows that the performance of FRA correlates strongly to PQF model with 98% correlation compared to the Kolmogorov-Smirnov Correlation Based Filter (KSCBF) algorithm with 83% correlation. Statistical significance test was also performed with score of 0.052 compared to the KSCBF algorithm with score of 0.048. The result shows that the FRA was more significant than KSCBF algorithm. The main contribution of this research is on the implementation of FRT with proposed Most Priority of Features (MPF) calculation in FRA for attributes assessment. Overall, the findings and contributions can be regarded as a novel effort in software quality for attributes selection

    Stability of feature selection algorithms: a study on high-dimensional spaces

    Get PDF
    With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorith

    Bioinformatic-driven search for metabolic biomarkers in disease

    Get PDF
    The search and validation of novel disease biomarkers requires the complementary power of professional study planning and execution, modern profiling technologies and related bioinformatics tools for data analysis and interpretation. Biomarkers have considerable impact on the care of patients and are urgently needed for advancing diagnostics, prognostics and treatment of disease. This survey article highlights emerging bioinformatics methods for biomarker discovery in clinical metabolomics, focusing on the problem of data preprocessing and consolidation, the data-driven search, verification, prioritization and biological interpretation of putative metabolic candidate biomarkers in disease. In particular, data mining tools suitable for the application to omic data gathered from most frequently-used type of experimental designs, such as case-control or longitudinal biomarker cohort studies, are reviewed and case examples of selected discovery steps are delineated in more detail. This review demonstrates that clinical bioinformatics has evolved into an essential element of biomarker discovery, translating new innovations and successes in profiling technologies and bioinformatics to clinical application

    Gait-based Gender Classification Considering Resampling and Feature Selection

    Get PDF
    Two intrinsic data characteristics that arise in many domains are the class imbalance and the high dimensionality, which pose new challenges that should be addressed. When using gait for gender classification, benchmarking public databases and renowned gait representations lead to these two problems, but they have not been jointly studied in depth. This paper is a preliminary study that pursues to investigate the benefits of using several techniques to tackle the aforementioned problems either singly or in combination, and also to evaluate the order of application that leads to the best classification performance. Experimental results show the importance of jointly managing both problems for gait-based gender classification. In particular, it seems that the best strategy consists of applying resampling followed by feature selection

    Evaluation patterns and algorithm for cancer identifications using dynamic clustering

    Get PDF
    The domain of knowledge discovery and deep data extraction is quite prominent and used in assorted domains including engineering, mathematics and even in medical diagnosis. A number of benchmark datasets are available in which huge research work is going on with the enormous aspects of genomics that is associated with the medical data analytics. In this research manuscript, the work presents the evaluation patterns and the approaches which are used for the cancer identification with the use of dynamic clustering and deep data analytics. The work is having the elements with the medical datasets and their key features by which the training of data in the data mining algorithm can be integrated and then the overall predictions can be done on assorted parameters
    • 

    corecore