4,253 research outputs found

    The effect of locality based learning on software defect prediction

    Get PDF
    Software defect prediction poses many problems during classification. A common solution used to improve software defect prediction is to train on similar, or local, data to the testing data. Prior work [12, 64] shows that locality improves the performance of classifiers. This approach has been commonly applied to the field of software defect prediction. In this thesis, we compare the performance of many classifiers, both locality based and non-locality based. We propose a novel classifier called Clump, with the goals of improving classification while providing an explanation as to how the decisions were reached. We also explore the effects of standard clustering and relevancy filtering algorithms.;Through experimentation, we show that locality does not improve classification performance when applied to software defect prediction. The performance of the algorithms is impacted more by the datasets used than by the algorithmic choices made. More research is needed to explore locality based learning and the impact of the datasets chosen

    KACST Arabic Text Classification Project: Overview and Preliminary Results

    No full text
    Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques

    Local feature weighting in nearest prototype classification

    Get PDF
    The distance metric is the corner stone of nearest neighbor (NN)-based methods, and therefore, of nearest prototype (NP) algorithms. That is because they classify depending on the similarity of the data. When the data is characterized by a set of features which may contribute to the classification task in different levels, feature weighting or selection is required, sometimes in a local sense. However, local weighting is typically restricted to NN approaches. In this paper, we introduce local feature weighting (LFW) in NP classification. LFW provides each prototype its own weight vector, opposite to typical global weighting methods found in the NP literature, where all the prototypes share the same one. Providing each prototype its own weight vector has a novel effect in the borders of the Voronoi regions generated: They become nonlinear. We have integrated LFW with a previously developed evolutionary nearest prototype classifier (ENPC). The experiments performed both in artificial and real data sets demonstrate that the resulting algorithm that we call LFW in nearest prototype classification (LFW-NPC) avoids overfitting on training data in domains where the features may have different contribution to the classification task in different areas of the feature space. This generalization capability is also reflected in automatically obtaining an accurate and reduced set of prototypes.Publicad

    Toward Optimal Feature Selection in Naive Bayes for Text Categorization

    Full text link
    Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MDMD) and MDχ2MD-\chi^2 methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure

    Locally weighted learning: How and when does it work in Bayesian networks?

    Full text link
    © 2016, Taylor and Francis Ltd. All rights reserved. Bayesian network (BN), a simple graphical notation for conditional independence assertions, is promised to represent the probabilistic relationships between diseases and symptoms. Learning the structure of a Bayesian network classifier (BNC) encodes conditional independence assumption between attributes, which may deteriorate the classification performance. One major approach to mitigate the BNC’s primary weakness (the attributes independence assumption) is the locally weighted approach. And this type of approach has been proved to achieve good performance for naive Bayes, a BNC with simple structure. However, we do not know whether or how effective it works for improving the performance of the complex BNC. In this paper, we first do a survey on the complex structure models for BNCs and their improvements, then carry out a systematically experimental analysis to investigate the effectiveness of locally weighted method for complex BNCs, e.g., tree-augmented naive Bayes (TAN), averaged one-dependence estimators AODE and hidden naive Bayes (HNB), measured by classification accuracy (ACC) and the area under the ROC curve ranking (AUC). Experiments and comparisons on 36 benchmark data sets collected from University of California, Irvine (UCI) in Weka system demonstrate that locally weighting technologies just slightly outperforms unweighted complex BNCs on ACC and AUC. In other words, although locally weighting could significantly improve the performance of NB (a BNC with simple structure), it could not work well on BNCs with complex structures. This is because the performance improvements of BNCs are attributed to their structures not the locally weighting

    Predicting Peptide Structures in Native Proteins from Physical Simulations of Fragments

    Get PDF
    It has long been proposed that much of the information encoding how a protein folds is contained locally in the peptide chain. Here we present a large-scale simulation study designed to examine the extent to which conformations of peptide fragments in water predict native conformations in proteins. We perform replica exchange molecular dynamics (REMD) simulations of 872 8-mer, 12-mer, and 16-mer peptide fragments from 13 proteins using the AMBER 96 force field and the OBC implicit solvent model. To analyze the simulations, we compute various contact-based metrics, such as contact probability, and then apply Bayesian classifier methods to infer which metastable contacts are likely to be native vs. non-native. We find that a simple measure, the observed contact probability, is largely more predictive of a peptide's native structure in the protein than combinations of metrics or multi-body components. Our best classification model is a logistic regression model that can achieve up to 63% correct classifications for 8-mers, 71% for 12-mers, and 76% for 16-mers. We validate these results on fragments of a protein outside our training set. We conclude that local structure provides information to solve some but not all of the conformational search problem. These results help improve our understanding of folding mechanisms, and have implications for improving physics-based conformational sampling and structure prediction using all-atom molecular simulations

    A comparative study of machine learning methods for verbal autopsy text classification

    Get PDF
    A Verbal Autopsy is the record of an interview about the circumstances of an uncertified death. In developing countries, if a death occurs away from health facilities, a field-worker interviews a relative of the deceased about the circumstances of the death; this Verbal Autopsy can be reviewed offsite. We report on a comparative study of the processes involved in Text Classification applied to classifying Cause of Death: feature value representation; machine learning classification algorithms; and feature reduction strategies in order to identify the suitable approaches applicable to the classification of Verbal Autopsy text. We demonstrate that normalised term frequency and the standard TFiDF achieve comparable performance across a number of classifiers. The results also show Support Vector Machine is superior to other classification algorithms employed in this research. Finally, we demonstrate the effectiveness of employing a ’locally-semisupervised’ feature reduction strategy in order to increase performance accuracy

    Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

    Full text link
    Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig

    Use of Machine Learning for Early Detection of Knee Osteoarthritis and Quantifying Effectiveness of Treatment Using Force Platform

    Get PDF
    Knee osteoarthritis is one of the most prevalent chronic diseases. It leads to pain, stiffness, decreased participation in activities of daily living and problems with balance recognition. Force platforms have been one of the tools used to analyse balance in patients. However, identification in early stages and assessing the severity of osteoarthritis using parameters derived from a force plate are yet unexplored to the best of our knowledge. Combining artificial intelligence with medical knowledge can provide a faster and more accurate diagnosis. The aim of our study is to present a novel algorithm to classify the occurrence and severity of knee osteoarthritis based on the parameters derived from a force plate. Forty-four sway movements graphs were measured. The different machine learning algorithms, such as K-Nearest Neighbours, Logistic Regression, Gaussian Naive Bayes, Support Vector Machine, Decision Tree Classifier and Random Forest Classifier, were implemented on the dataset. The proposed method achieves 91% accuracy in detecting sway variation and would help the rehabilitation specialist to objectively identify the patient’s condition in the initial stage and educate the patient about disease progression
    corecore