18 research outputs found

    Statistical Performance Effect of Feature Selection Techniques on Eye State Prediction Using EEG

    Get PDF
    Several recent studies have demonstrated that electrical waves recorded by electroencephalogram (EEG) can be used to Predict eye state (Open or Closed) and all the studies in the literatures used 14 electrodes for data recording. To reduce the number of electrodes without affecting the statistical performance of an EEG device, it is not an easy task. Hence, the focus of this paper is on reducing the number of EEG electrodes by means of feature selection techniques without any consequences on the statistical performance measures of the earlier EEG devices. In this study, we compared different attribute evaluators and classifiers. The results of the experiments have shown that ReliefF attribute evaluator was the best to identify the two least important features (P7, P8) with 96.3% accuracy. The overall results show that two data-recording electrodes could be removed from the EEG devices and still perform well for eye state prediction. The accuracy achieved was equal to 96.3% with KStar (K*) classifier which was also the best classifier among the 21 tested classifiers in this study

    Reconstructing dynamical networks via feature ranking

    Full text link
    Empirical data on real complex systems are becoming increasingly available. Parallel to this is the need for new methods of reconstructing (inferring) the topology of networks from time-resolved observations of their node-dynamics. The methods based on physical insights often rely on strong assumptions about the properties and dynamics of the scrutinized network. Here, we use the insights from machine learning to design a new method of network reconstruction that essentially makes no such assumptions. Specifically, we interpret the available trajectories (data) as features, and use two independent feature ranking approaches -- Random forest and RReliefF -- to rank the importance of each node for predicting the value of each other node, which yields the reconstructed adjacency matrix. We show that our method is fairly robust to coupling strength, system size, trajectory length and noise. We also find that the reconstruction quality strongly depends on the dynamical regime

    Predictive based hybrid ranker to yield significant features in writer identification

    Get PDF
    The contribution of writer identification (WI) towards personal identification in biometrics traits is known because it is easily accessible, cheaper, more reliable and acceptable as compared to other methods such as personal identification based DNA, iris and fingerprint. However, the production of high dimensional datasets has resulted into too many irrelevant or redundant features. These unnecessary features increase the size of the search space and decrease the identification performance. The main problem is to identify the most significant features and select the best subset of features that can precisely predict the authors. Therefore, this study proposed the hybridization of GRA Features Ranking and Feature Subset Selection (GRAFeSS) to develop the best subsets of highest ranking features and developed discretization model with the hybrid method (Dis-GRAFeSS) to improve classification accuracy. Experimental results showed that the methods improved the performance accuracy in identifying the authorship of features based ranking invariant discretization by substantially reducing redundant features

    Analysis of Rank Aggregation Techniques for Rank Based on the Feature Selection Technique

    Get PDF
    In order to improve classification accuracy and lower future computation and data collecting costs, feature selection is the process of choosing the most crucial features from a group of attributes and removing the less crucial or redundant ones. To narrow down the features that need to be analyzed, a variety of feature selection procedures have been detailed in published publications. Chi-Square (CS), IG, Relief, GR, Symmetrical Uncertainty (SU), and MI are six alternative feature selection methods used in this study. The provided dataset is aggregated using four rank aggregation strategies: "rank aggregation," "Borda Count (BC) methodology," "score and rank combination," and "unified feature scoring" based on the outcomes of the six feature selection method (UFS). These four procedures by themselves were unable to generate a clear selection rank for the characteristic. To produce different ranks of traits, this ensemble of aggregating ranks is carried out. For this, the bagging method of majority voting was applied

    Confident Feature Ranking

    Full text link
    Interpretation of feature importance values often relies on the relative order of the features rather than on the value itself, referred to as ranking. However, the order may be unstable due to the small sample sizes used in calculating the importance values. We propose that post-hoc importance methods produce a ranking and simultaneous confident intervals for the rankings. Based on pairwise comparisons of the feature importance values, our method is guaranteed to include the ``true'' (infinite sample) ranking with high probability and allows for selecting top-k sets

    Clustering of match running and performance indicators to assess between- and within-playing position similarity in professional rugby league

    Get PDF
    This study aimed to determine the similarity between and within positions in professional rugby league in terms of technical performance and match displacement. Here, the analyses were repeated on 3 different datasets which consisted of technical features only, displacement features only, and a combined dataset including both. Each dataset contained 7617 observations from the 2018 and 2019 Super League seasons, including 366 players from 11 teams. For each dataset, feature selection was initially used to rank features regarding their importance for predicting a player’s position for each match. Subsets of 12, 11, and 27 features were retained for technical, displacement, and combined datasets for subsequent analyses. Hierarchical cluster analyses were then carried out on the positional means to find logical groupings. For the technical dataset, 3 clusters were found: (1) props, loose forwards, second-row, hooker; (2) halves; (3) wings, centres, fullback. For displacement, 4 clusters were found: (1) second-rows, halves; (2) wings, centres; (3) fullback; (4) props, loose forward, hooker. For the combined dataset, 3 clusters were found: (1) halves, fullback; (2) wings and centres; (3) props, loose forward, hooker, second-rows. These positional clusters can be used to standardise positional groups in research investigating either technical, displacement, or both constructs within rugby league.</p

    Bootstrapped-ensemble-based Sensitivity Analysis of a trace thermal-hydraulic model based on a limited number of PWR large break loca simulations

    Get PDF
    [EN] The safety verification of nuclear systems can be done by analyzing the outputs of Best-Estimate Thermal-Hydraulic (BE-TH) codes, which allow predicting the system response under safe and accidental conditions with greater realism as compared to conservative TH codes. In this case, it is necessary to quantify and control the uncertainties in the analysis, which affect the estimated safety margins. This can be achieved by Sensitivity Analysis (SA) and Uncertainty Analysis (UA) techniques tailored to handle the large computational costs of TH codes. This work presents an Ensemble-Based Sensitivity Analysis (EBSA) based on Finite Mixture Model (FMM) as an effective solution to keep low the code runs and handle the uncertainty in the SA methods. The approach proposed is challenged against a situation of a very low number of code runs: the Bootstrap method is, then, used in support. Three different strategies based on EBSA and Bootstrap are set forth (i.e., bottom-up, all-out and filter strategies). An application is provided with respect to a Large Break Loss of Coolant Accident (LBLOCA) simulated by a TRACE model of the Zion 1 Nuclear Power Plant (NPP).Di Maio, F.; Bandini, A.; Zio, E.; Carlos Alberola, S.; Sanchez Saez, F.; Martorell Alsina, SS. (2016). Bootstrapped-ensemble-based Sensitivity Analysis of a trace thermal-hydraulic model based on a limited number of PWR large break loca simulations. Reliability Engineering & System Safety. 153:122-134. doi:10.1016/j.ress.2016.04.013S12213415

    Personalized medicine support system : resolving conflict in allocation to risk groups and predicting patient molecular response to targeted therapy

    Get PDF
    Treatment management in cancer patients is largely based on the use of a standardized set of predictive and prognostic factors. The former are used to evaluate specific clinical interventions, and they can be useful for selecting treatments because they directly predict the response to a treatment. The latter are used to evaluate a patient’s overall outcomes, and can be used to identify the risks or recurrence of a disease. Current intelligent systems can be a solution for transferring advancements in molecular biology into practice, especially for predicting the molecular response to molecular targeted therapy and the prognosis of risk groups in cancer medicine. This framework primarily focuses on the importance of integrating domain knowledge in predictive and prognostic models for personalized treatment. Our personalized medicine support system provides the needed support in complex decisions and can be incorporated into a treatment guide for selecting molecular targeted therapies.Haneen Banjar, David Adelson, Fred Brown, and Tamara Leclerc

    Modelling predictors of molecular response to frontline imatinib for patients with chronic myeloid leukaemia

    Get PDF
    BACKGROUND: Treatment of patients with chronic myeloid leukaemia (CML) has become increasingly difficult in recent years due to the variety of treatment options available and challenge deciding on the most appropriate treatment strategy for an individual patient. To facilitate the treatment strategy decision, disease assessment should involve molecular response to initial treatment for an individual patient. Patients predicted not to achieve major molecular response (MMR) at 24 months to frontline imatinib may be better treated with alternative frontline therapies, such as nilotinib or dasatinib. The aims of this study were to i) understand the clinical prediction 'rules' for predicting MMR at 24 months for CML patients treated with imatinib using clinical, molecular, and cell count observations (predictive factors collected at diagnosis and categorised based on available knowledge) and ii) develop a predictive model for CML treatment management. This predictive model was developed, based on CML patients undergoing imatinib therapy enrolled in the TIDEL II clinical trial with an experimentally identified achieving MMR group and non-achieving MMR group, by addressing the challenge as a machine learning problem. The recommended model was validated externally using an independent data set from King Faisal Specialist Hospital and Research Centre, Saudi Arabia. PRINCIPLE FINDINGS: The common prognostic scores yielded similar sensitivity performance in testing and validation datasets and are therefore good predictors of the positive group. The G-mean and F-score values in our models outperformed the common prognostic scores in testing and validation datasets and are therefore good predictors for both the positive and negative groups. Furthermore, a high PPV above 65% indicated that our models are appropriate for making decisions at diagnosis and pre-therapy. Study limitations include that prior knowledge may change based on varying expert opinions; hence, representing the category boundaries of each predictive factor could dramatically change performance of the models.Haneen Banjar, Damith Ranasinghe, Fred Brown, David Adelson, Trent Kroger, Tamara Leclercq, Deborah White, Timothy Hughes, Naeem Chaudhr

    Predictive Based Hybrid Ranker To Yield Significant Features In Writer Identification

    Get PDF
    The contribution of writer identification (WI) towards personal identification in biometrics traits is known because it is easily accessible, cheaper, more reliable and acceptable as compared to other methods such as personal identification based DNA, iris and fingerprint. However, the production of high dimensional datasets has resulted into too many irrelevant or redundant features. These unnecessary features increase the size of the search space and decrease the identification performance. The main problem is to identify the most significant features and select the best subset of features that can precisely predict the authors. Therefore, this study proposed the hybridization of GRA Features Ranking and Feature Subset Selection (GRAFeSS) to develop the best subsets of highest ranking features and developed discretization model with the hybrid method (Dis-GRAFeSS) to improve classification accuracy. Experimental results showed that the methods improved the performance accuracy in identifying the authorship of features based ranking invariant discretization by substantially reducing redundant features
    corecore