32 research outputs found

    Ensemble learning method for hidden markov models.

    Get PDF
    For complex classification systems, data are gathered from various sources and potentially have different representations. Thus, data may have large intra-class variations. In fact, modeling each data class with a single model might lead to poor generalization. The classification error can be more severe for temporal data where each sample is represented by a sequence of observations. Thus, there is a need for building a classification system that takes into account the variations within each class in the data. This dissertation introduces an ensemble learning method for temporal data that uses a mixture of Hidden Markov Model (HMM) classifiers. We hypothesize that the data are generated by K models, each of which reacts a particular trend in the data. Model identification could be achieved through clustering in the feature space or in the parameters space. However, this approach is inappropriate in the context of sequential data. The proposed approach is based on clustering in the log-likelihood space, and has two main steps. First, one HMM is fit to each of the N individual sequences. For each fitted model, we evaluate the log-likelihood of each sequence. This will result in an N-by-N log-likelihood distance matrix that will be partitioned into K groups using a relational clustering algorithm. In the second step, we learn the parameters of one HMM per group. We propose using and optimizing various training approaches for the different K groups depending on their size and homogeneity. In particular, we investigate the maximum likelihood (ML), the minimum classification error (MCE) based discriminative, and the Variational Bayesian (VB) training approaches. Finally, to test a new sequence, its likelihood is computed in all the models and a final confidence value is assigned by combining the multiple models outputs using a decision level fusion method such as an artificial neural network or a hierarchical mixture of experts. Our approach was evaluated on two real-world applications: (1) identification of Cardio-Pulmonary Resuscitation (CPR) scenes in video simulating medical crises; and (2) landmine detection using Ground Penetrating Radar (GPR). Results on both applications show that the proposed method can identify meaningful and coherent HMM mixture components that describe different properties of the data. Each HMM mixture component models a group of data that share common attributes. The results indicate that the proposed method outperforms the baseline HMM that uses one model for each class in the data

    Generalized multi-stream hidden Markov models.

    Get PDF
    For complex classification systems, data is usually gathered from multiple sources of information that have varying degree of reliability. In fact, assuming that the different sources have the same relevance in describing all the data might lead to an erroneous behavior. The classification error accumulates and can be more severe for temporal data where each sample is represented by a sequence of observations. Thus, there is compelling evidence that learning algorithms should include a relevance weight for each source of information (stream) as a parameter that needs to be learned. In this dissertation, we assumed that the multi-stream temporal data is generated by independent and synchronous streams. Using this assumption, we develop, implement, and test multi- stream continuous and discrete hidden Markov model (HMM) algorithms. For the discrete case, we propose two new approaches to generalize the baseline discrete HMM. The first one combines unsupervised learning, feature discrimination, standard discrete HMMs and weighted distances to learn the codebook with feature-dependent weights for each symbol. The second approach consists of modifying the HMM structure to include stream relevance weights, generalizing the standard discrete Baum-Welch learning algorithm, and deriving the necessary conditions to optimize all model parameters simultaneously. We also generalize the minimum classification error (MCE) discriminative training algorithm to include stream relevance weights. For the continuous HMM, we introduce a. new approach that integrates the stream relevance weights in the objective function. Our approach is based on the linearization of the probability density function. Two variations are proposed: the mixture and state level variations. As in the discrete case, we generalize the continuous Baum-Welch learning algorithm to accommodate these changes, and we derive the necessary conditions for updating the model parameters. We also generalize the MCE learning algorithm to derive the necessary conditions for the model parameters\u27 update. The proposed discrete and continuous HMM are tested on synthetic data sets. They are also validated on various applications including Australian Sign Language, audio classification, face classification, and more extensively on the problem of landmine detection using ground penetrating radar data. For all applications, we show that considerable improvement can be achieved compared to the baseline HMM and the existing multi-stream HMM algorithms

    A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

    Full text link
    Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential equations. To address this issue, we propose using a novel class of modified softmax gating functions which transform the input value before delivering them to the gating functions. As a result, the previous interaction disappears and the parameter estimation rates are significantly improved.Comment: 36 page

    Lifelong Machine Learning Of Functionally Compositional Structures

    Get PDF
    A hallmark of human intelligence is the ability to construct self-contained chunks of knowledge and reuse them in novel combinations for solving different yet structurally related problems. Learning such compositional structures has been a significant challenge for artificial systems, due to the underlying combinatorial search. To date, research into compositional learning has largely proceeded separately from work on lifelong or continual learning. This dissertation integrated these two lines of work to present a general-purpose framework for lifelong learning of functionally compositional structures. The framework separates the learning into two stages: learning how to best combine existing components to assimilate a novel problem, and learning how to adapt the set of existing components to accommodate the new problem. This separation explicitly handles the trade-off between the stability required to remember how to solve earlier tasks and the flexibility required to solve new tasks. This dissertation instantiated the framework into various supervised and reinforcement learning (RL) algorithms. Empirical evaluations on a range of supervised learning benchmarks compared the proposed algorithms against well-established techniques, and found that 1)~compositional models enable improved lifelong learning when the tasks are highly diverse by balancing the incorporation of new knowledge and the retention of past knowledge, 2)~the separation of the learning into stages permits lifelong learning of compositional knowledge, and 3)~the components learned by the proposed methods represent self-contained and reusable functions. Similar evaluations on existing and new RL benchmarks demonstrated that 1)~algorithms under the framework accelerate the discovery of high-performing policies in a variety of domains, including robotic manipulation, and 2)~these algorithms retain, and often improve, knowledge that enables them to solve tasks learned in the past. The dissertation extended one lifelong compositional RL algorithm to the nonstationary setting, where the distribution over tasks varies over time, and found that modularity permits individually tracking changes to different elements in the environment. The final contribution of this dissertation was a new benchmark for evaluating approaches to compositional RL, which exposed that existing methods struggle to discover the compositional properties of the environment

    Classification non linéaire à l'aide de mélange de modèles discriminatifs

    Get PDF
    Cette thèse a pour objectif la classification non linéaire à l’aide de mélange hiérarchique de modèles discriminatifs. Dans ce cadre, nous avons proposé un modèle général pour le mélange hiérarchique de modèles discriminatifs. Ce dernier permet de combiner un ensemble de modèles discriminatifs à l’aide de fonctions que nous avons nommées fonctions de sélection. De ce modèle, ont été extraits deux exemples afin de montrer qu’il est possible d’utiliser un nombre réduit de modèles discriminatifs comparé aux précédents modèles. Cette réduction se fait tout en gardant des performances de classification élevées. Ceci est possible en effectuant un choix de fonctions de sélection qui ont permis une répartition efficiente des tâches. Par la suite, nous avons proposé deux modèles discriminatifs pour la classification des données proportionnelles. Ces modèles sont basés sur la distribution de la Dirichlet généralisée. Afin d’estimer convenablement les paramètres de ces deux modèles, nous avons établi une borne supérieure au mélange de Dirichlet généralisée. Des expériences ont montré l’intérêt de ces modèles de même que leurs limites

    Feature and Decision Level Fusion Using Multiple Kernel Learning and Fuzzy Integrals

    Get PDF
    The work collected in this dissertation addresses the problem of data fusion. In other words, this is the problem of making decisions (also known as the problem of classification in the machine learning and statistics communities) when data from multiple sources are available, or when decisions/confidence levels from a panel of decision-makers are accessible. This problem has become increasingly important in recent years, especially with the ever-increasing popularity of autonomous systems outfitted with suites of sensors and the dawn of the ``age of big data.\u27\u27 While data fusion is a very broad topic, the work in this dissertation considers two very specific techniques: feature-level fusion and decision-level fusion. In general, the fusion methods proposed throughout this dissertation rely on kernel methods and fuzzy integrals. Both are very powerful tools, however, they also come with challenges, some of which are summarized below. I address these challenges in this dissertation. Kernel methods for classification is a well-studied area in which data are implicitly mapped from a lower-dimensional space to a higher-dimensional space to improve classification accuracy. However, for most kernel methods, one must still choose a kernel to use for the problem. Since there is, in general, no way of knowing which kernel is the best, multiple kernel learning (MKL) is a technique used to learn the aggregation of a set of valid kernels into a single (ideally) superior kernel. The aggregation can be done using weighted sums of the pre-computed kernels, but determining the summation weights is not a trivial task. Furthermore, MKL does not work well with large datasets because of limited storage space and prediction speed. These challenges are tackled by the introduction of many new algorithms in the following chapters. I also address MKL\u27s storage and speed drawbacks, allowing MKL-based techniques to be applied to big data efficiently. Some algorithms in this work are based on the Choquet fuzzy integral, a powerful nonlinear aggregation operator parameterized by the fuzzy measure (FM). These decision-level fusion algorithms learn a fuzzy measure by minimizing a sum of squared error (SSE) criterion based on a set of training data. The flexibility of the Choquet integral comes with a cost, however---given a set of N decision makers, the size of the FM the algorithm must learn is 2N. This means that the training data must be diverse enough to include 2N independent observations, though this is rarely encountered in practice. I address this in the following chapters via many different regularization functions, a popular technique in machine learning and statistics used to prevent overfitting and increase model generalization. Finally, it is worth noting that the aggregation behavior of the Choquet integral is not intuitive. I tackle this by proposing a quantitative visualization strategy allowing the FM and Choquet integral behavior to be shown simultaneously

    Task Relationship Modeling in Lifelong Multitask Learning

    Get PDF
    Multitask Learning is a learning framework which explores the concept of sharing training information among multiple related tasks to improve the generalization error of each task. The benefits of multitask learning have been shown both empirically and theoretically. There are a number of fields that benefit from multitask learning such as toxicology, image annotation, compressive sensing etc. However, majority of multitask learning algorithms make a very important key assumption that all the tasks are related to each other in a similar fashion in multitask learning. The users often do not have the knowledge of which tasks are related and train all tasks together. This results in sharing of training information even among the unrelated tasks. Training unrelated tasks together can cause a negative transfer and deteriorate the performance of multitask learning. For example, consider the case of predicting in vivo toxicity of chemicals at various endpoints from the chemical structure. Toxicity at all the endpoints are not related. Since, biological networks are highly complex, it is also not possible to predetermine which endpoints are related. Training all the endpoints together may cause a negative effect on the overall performance. Therefore, it is important to establish the task relationship models in multitask learning. Multitask learning with task relationship modeling may be explored in three different settings, namely, static learning, online fixed task learning and most recent lifelong learning. The multitask learning algorithms in static setting have been present for more than a decade and there is a lot of literature in this field. However, utilization of task relationships in multitask learning framework has been studied in detail for past several years only. The literature which uses feature selection with task relationship modeling is even further limited. For the cases of online and lifelong learning, task relationship modeling becomes a challenge. In online learning, the knowledge of all the tasks is present before starting the training of the algorithms, and the samples arrive in online fashion. However, in case of lifelong multitask learning, the tasks also arrive in an online fashion. Therefore, modeling the task relationship is even a further challenge in lifelong multitask learning framework as compared to online multitask learning. The main contribution of this thesis is to propose a framework for modeling task relationships in lifelong multitask learning. The initial algorithms are preliminary studies which focus on static setting and learn the clusters of related tasks with feature selection. These algorithms enforce that all the tasks which are related select a common set of features. The later part of the thesis shifts gear to lifelong multitask learning setting. Here, we propose learning functions to represent the relationship between tasks. Learning functions is faster and computationally less expensive as opposed to the traditional manner of learning fixed sized matrices for depicting the task relationship models

    Opinion Mining of Sociopolitical Comments from Social Media

    Get PDF
    corecore