5 research outputs found

    Supervised Classification Using Probabilistic Decision Graphs

    Get PDF
    A new model for supervised classification based on probabilistic decision graphs is introduced. A probabilistic decision graph (PDG) is a graphical model that efficiently captures certain context specific independencies that are not easily represented by other graphical models traditionally used for classification, such as the Naïve Bayes (NB) or Classification Trees (CT). This means that the PDG model can capture some distributions using fewer parameters than classical models. Two approaches for constructing a PDG for classification are proposed. The first is to directly construct the model from a dataset of labelled data, while the second is to transform a previously obtained Bayesian classifier into a PDG model that can then be refined. These two approaches are compared with a wide range of classical approaches to the supervised classification problem on a number of both real world databases and artificially generated data

    Hybrid Automated Machine Learning System for Big Data

    Get PDF
    A lot of machine learning (ML) models and algorithms exist and in designing classification systems, it is often a challenge looking for and selecting the best performing ML algorithm(s) to use for a dataset in a short period of time. Often, one must learn thor-oughly about the data set structure and content, decide whether to use a supervised, semi-supervised or an unsupervised learning strategy, and then investigate, select or design via trial and error a classification or clustering algorithm that would work most accurately for that specific dataset. This can be quite a time consuming and tedious process. Additionally, a classification algorithm may not perform very well with a dataset as compared to using a clustering algorithm. Meta-learning (learning to learn) and automatic ML (autoML) are data mining-based formalisms for modelling evolving conventional ML functions and toolkit systems. The concept of modelling a decision tree-based combination of both formalisms as a Hybrid-AutoML toolkit extends that of traditional complex autoML systems. In hybrid-autoML, single or multiple predictive models are built by combining a three-layered decision learning architecture for automatic learning mode and model selection, by engaging formal-isms for selecting from a variety of supervised or unsupervised ML algorithms and generic meta information obtained from varying multi-datasets. The work presented in this thesis aims to study, conceptualize, design and develop this hybrid-autoML toolkit. By extending in the simplest form, some existing methodologies for the model training aspect of autoML systems. The theoretical and experimental development focuses on the extension of autoWeka and use of existing meta-learning, algorithm selection and deci-sion tree concepts. It addresses the issue of efficient ML mode (supervised or unsupervised) and model selection for varying multi-datasets, learning methods representations of practical alternative use cases and structuring of layered decision ML un-folding, and algorithms for constructing the unfolding. The im-plementation aims to develop tools for hybrid-autoML based model visualization or evaluation, use case simulations and analysis on single or multi varying datasets. An open source tool called hybrid-autoML has been developed to support these functionali-ties. Hybrid-autoML provides a user-friendly graphical interface that facilitates single or multi varying datasets entry, sup-ports automatic learning mode or strategy selection, automatic model selection on single or multi-varying datasets, supports predictive testing, and allows the automatic visualization and use of a set of analytical tools for model evaluation. It is highly extensible and saves a lot of time

    Supervised classification using probabilistic decision graphs

    No full text
    A new model for supervised classification based on probabilistic decision graphs is introduced. A probabilistic decision graph (PDG) is a graphical model that efficiently captures certain context specific independencies that are not easily represented by other graphical models traditionally used for classification, such as the Naïve Bayes (NB) or Classification Trees (CT). This means that the PDG model can capture some distributions using fewer parameters than classical models. Two approaches for constructing a PDG for classification are proposed. The first is to directly construct the model from a dataset of labelled data, while the second is to transform a previously obtained Bayesian classifier into a PDG model that can then be refined. These two approaches are compared with a wide range of classical approaches to the supervised classification problem on a number of both real world databases and artificially generated data.
    corecore