53 research outputs found

    SOAP: Efficient Feature Selection of Numeric Attributes

    Get PDF
    The attribute selection techniques for supervised learning, used in the preprocessing phase to emphasize the most relevant attributes, allow making models of classification simpler and easy to understand. Depending on the method to apply: starting point, search organization, evaluation strategy, and the stopping criterion, there is an added cost to the classification algorithm that we are going to use, that normally will be compensated, in greater or smaller extent, by the attribute reduction in the classification model. The algorithm (SOAP: Selection of Attributes by Projection) has some interesting characteristics: lower computational cost (O(mn log n) m attributes and n examples in the data set) with respect to other typical algorithms due to the absence of distance and statistical calculations; with no need for transformation. The performance of SOAP is analysed in two ways: percentage of reduction and classification. SOAP has been compared to CFS [6] and ReliefF [11]. The results are generated by C4.5 and 1NN before and after the application of the algorithms

    Heuristic Search over a Ranking for Feature Selection

    Get PDF
    In this work, we suggest a new feature selection technique that lets us use the wrapper approach for finding a well suited feature set for distinguishing experiment classes in high dimensional data sets. Our method is based on the relevance and redundancy idea, in the sense that a ranked-feature is chosen if additional information is gained by adding it. This heuristic leads to considerably better accuracy results, in comparison to the full set, and other representative feature selection algorithms in twelve well–known data sets, coupled with notable dimensionality reduction

    A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bioactivity profiling using high-throughput <it>in vitro </it>assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex <it>in vitro/in vivo </it>datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods.</p> <p>Results</p> <p>The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated <it>in vitro </it>assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA.</p> <p>Conclusion</p> <p>We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.</p

    Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers

    Get PDF
    PC and TPDA algorithms are robust and well known prototype algorithms, incorporating constraint-based approaches for causal discovery. However, both algorithms cannot scale up to deal with high dimensional data, that is more than few hundred features. This chapter presents hybrid correlation and causal feature selection for ensemble classifiers to deal with this problem. Redundant features are removed by correlation-based feature selection and then irrelevant features are eliminated by causal feature selection. The number of eliminated features, accuracy, the area under the receiver operating characteristic curve (AUC) and false negative rate (FNR) of proposed algorithms are compared with correlation-based feature selection (FCBF and CFS) and causal based feature selection algorithms (PC, TPDA, GS, IAMB)

    Simplified Method to Predict Mutual Interactions of Human Transcription Factors Based on Their Primary Structure

    Get PDF
    Background: Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation. Methodology: We propose here such a method for the prediction of TF interactions. The method uses only the primary sequence information of the interacting TFs, resulting in a much greater simplicity of the prediction algorithm. Through an advanced feature selection process, we determined a subset of 97 model features that constitute the optimized model in the subset we considered. The model, based on quadratic discriminant analysis, achieves a prediction accuracy of 85.39 % on a blind set of interactions. This result is achieved despite the selection for the negative data set of only those TF from the same type of proteins, i.e. TFs that function in the same cellular compartment (nucleus) and in the same type of molecular process (transcription initiation). Such selection poses significant challenges for developing models with high specificity, but at the same time better reflects real-world problems. Conclusions: The performance of our predictor compares well to those of much more complex approaches for predicting TF and general protein-protein interactions, particularly when taking the reduced complexity of model utilisation into account

    Combinatorial Approach for Data Binarization

    No full text

    Systematic review of the role of rituximab in treatment of antineutrophil cytoplasmic autoantibody-associated vasculitis, hepatitis C virus-related cryoglobulinemic vasculitis, Henoch&ndash;Sch&ouml;nlein purpura, ankylosing spondylitis, and Raynaud&#39;s phenomenon

    No full text
    Rbab Taha,1 Hadeel El-Haddad,1 Abdulqader Almuallim,2 Fatma Alshaiki,3 Elaf Obaid,2 Hani Almoallim1,2,4 1Department of Medicine, Dr Soliman Fakeeh Hospital, Jeddah, 2Department of Medicine, Faculty of Medicine, Umm Al-Qura University, Mecca, 3Department of Medicine, East Jeddah Hospital, Jeddah, 4Rheumatic Diseases, Umm Al-Qura University, Mecca, Saudi Arabia Abstract: Rituximab (RTX) is established for the treatment of rheumatoid arthritis. This systematic review of the literature since 2006 summarizes evidence for the use of RTX in the treatment of additional rheumatological diseases: antineutrophil cytoplasmic autoantibody-associated vasculitis (AAV), hepatitis C virus-related cryoglobulinemic vasculitis, Henoch&ndash;Sch&ouml;nlein purpura, ankylosing spondylitis, and Raynaud&rsquo;s phenomenon. Data from randomized controlled trials are available only for AAV, confirming efficacy for remission induction, including in disease resistant to conventional treatment, and maintenance of remission. Further studies are required to confirm optimal maintenance regimens in AAV, important questions needing to be addressed including protocol administration versus treatment in response to clinical relapse and the importance of maintaining B-cell depletion. Sufficient data are available in other diseases to suggest RTX to be useful and that randomized controlled trials should be conducted. Keywords: anti-CD20 monoclonal antibody, anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis, refractory ankylosing spondylitis, resistant cryoglobulinemic vasculitis, refractory rheumatological diseases&nbsp
    • …
    corecore