2,573 research outputs found

    Learning cost-sensitive Bayesian networks via direct and indirect methods

    Get PDF
    Cost-Sensitive learning has become an increasingly important area that recognizes that real world classification problems need to take the costs of misclassification and accuracy into account. Much work has been done on cost-sensitive decision tree learning, but very little has been done on cost-sensitive Bayesian networks. Although there has been significant research on Bayesian networks there has been relatively little research on learning cost-sensitive Bayesian networks. Hence, this paper explores whether it is possible to develop algorithms that learn cost-sensitive Bayesian networks by taking (i) an indirect approach that changes the data distribution to reflect the costs of misclassification; and (ii) a direct approach that amends an existing accuracy based algorithm for learning Bayesian networks. An empirical comparison of the new approaches is carried out with cost-sensitive decision tree learning algorithms on 33 data sets, and the results show that the new algorithms perform better in terms of misclassification cost and maintaining accuracy

    Backward Sequential Feature Elimination And Joining Algorithms In Machine Learning

    Get PDF
    The Naïve Bayes Model is a special case of Bayesian networks with strong independence assumptions. It is typically used for classification problems. The Naïve Bayes model is trained using the given data to estimate the parameters necessary for classification. This model of classification is very popular since it is simple yet efficient and accurate. While the Naïve Bayes model is considered accurate on most of the problem instances, there is a set of problems for which the Naïve Bayes does not give accurate results when compared to other classifiers such as the decision tree algorithms. One reason for it could be the strong independence assumption of the Naïve Bayes model. This project aims at searching for dependencies between the features and studying the consequences of applying these dependencies in classifying instances. We propose two different algorithms, the Backward Sequential Joining and the Backward Sequential Elimination that can be applied in order to improve the accuracy of the Naïve Bayes model. We then compare the accuracies of the different algorithms and derive conclusion based on the results

    Applied Computational Techniques on Schizophrenia Using Genetic Mutations

    Get PDF
    [Abstract] Schizophrenia is a complex disease, with both genetic and environmental influence. Machine learning techniques can be used to associate different genetic variations at different genes with a (schizophrenic or non-schizophrenic) phenotype. Several machine learning techniques were applied to schizophrenia data to obtain the results presented in this study. Considering these data, Quantitative Genotype – Disease Relationships (QDGRs) can be used for disease prediction. One of the best machine learning-based models obtained after this exhaustive comparative study was implemented online; this model is an artificial neural network (ANN). Thus, the tool offers the possibility to introduce Single Nucleotide Polymorphism (SNP) sequences in order to classify a patient with schizophrenia. Besides this comparative study, a method for variable selection, based on ANNs and evolutionary computation (EC), is also presented. This method uses half the number of variables as the original ANN and the variables obtained are among those found in other publications. In the future, QDGR models based on nucleic acid information could be expanded to other diseases.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/0005Xunta de Galicia; Ref. 2009/5

    Virus-Host Coevolution: Common Patterns of Nucleotide Motif Usage in Flaviviridae and Their Hosts

    Get PDF
    Virus-host biological interaction is a continuous coevolutionary process involving both host immune system and viral escape mechanisms. Flaviviridae family is composed of fast evolving RNA viruses that infects vertebrate (mammals and birds) and/or invertebrate (ticks and mosquitoes) organisms. These host groups are very distinct life forms separated by a long evolutionary time, so lineage-specific anti-viral mechanisms are likely to have evolved. Flaviviridae viruses which infect a single host lineage would be subjected to specific host-induced pressures and, therefore, selected by them. In this work we compare the genomic evolutionary patterns of Flaviviridae viruses and their hosts in an attempt to uncover coevolutionary processes inducing common features in such disparate groups. Especially, we have analyzed dinucleotide and codon usage patterns in the coding regions of vertebrate and invertebrate organisms as well as in Flaviviridae viruses which specifically infect one or both host types. The two host groups possess very distinctive dinucleotide and codon usage patterns. A pronounced CpG under-representation was found in the vertebrate group, possibly induced by the methylation-deamination process, as well as a prominent TpA decrease. The invertebrate group displayed only a TpA frequency reduction bias. Flaviviridae viruses mimicked host nucleotide motif usage in a host-specific manner. Vertebrate-infecting viruses possessed under-representation of CpG and TpA, and insect-only viruses displayed only a TpA under-representation bias. Single-host Flaviviridae members which persistently infect mammals or insect hosts (Hepacivirus and insect-only Flavivirus, respectively) were found to posses a codon usage profile more similar to that of their hosts than to related Flaviviridae. We demonstrated that vertebrates and mosquitoes genomes are under very distinct lineage-specific constraints, and Flaviviridae viruses which specifically infect these lineages appear to be subject to the same evolutionary pressures that shaped their host coding regions, evidencing the lineage-specific coevolutionary processes between the viral and host groups

    Mining Predictive Patterns and Extension to Multivariate Temporal Data

    Get PDF
    An important goal of knowledge discovery is the search for patterns in the data that can help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explaining a specific outcome variable. An example is the task of identifying groups of patients that respond better to a certain treatment than the rest of the patients. We propose and present efficient methods for mining predictive patterns for both atemporal and temporal (time series) data. Our first method relies on frequent pattern mining to explore the search space. It applies a novel evaluation technique for extracting a small set of frequent patterns that are highly predictive and have low redundancy. We show the benefits of this method on several synthetic and public datasets. Our temporal pattern mining method works on complex multivariate temporal data, such as electronic health records, for the event detection task. It first converts time series into time-interval sequences of temporal abstractions and then mines temporal patterns backwards in time, starting from patterns related to the most recent observations. We show the benefits of our temporal pattern mining method on two real-world clinical tasks

    Improving the Interpretability of Classification Rules Discovered by an Ant Colony Algorithm: Extended Results

    Get PDF
    The vast majority of Ant Colony Optimization (ACO) algorithms for inducing classification rules use an ACO-based procedure to create a rule in an one-at-a-time fashion. An improved search strategy has been proposed in the cAnt-MinerPB algorithm, where an ACO-based procedure is used to create a complete list of rules (ordered rules)-i.e., the ACO search is guided by the quality of a list of rules, instead of an individual rule. In this paper we propose an extension of the cAnt-MinerPB algorithm to discover a set of rules (unordered rules). The main motivations for this work are to improve the interpretation of individual rules by discovering a set of rules and to evaluate the impact on the predictive accuracy of the algorithm. We also propose a new measure to evaluate the interpretability of the discovered rules to mitigate the fact that the commonly-used model size measure ignores how the rules are used to make a class prediction. Comparisons with state-of-the-art rule induction algorithms, support vector machines and the cAnt-MinerPB producing ordered rules are also presented
    corecore