2 research outputs found

    Domain Adaption via Feature Selection on Explicit Feature Map

    Full text link
    © 2018 IEEE. In most domain adaption approaches, all features are used for domain adaption. However, often, not every feature is beneficial for domain adaption. In such cases, incorrectly involving all features might cause the performance to degrade. In other words, to make the model trained on the source domain work well on the target domain, it is desirable to find invariant features for domain adaption rather than using all features. However, invariant features across domains may lie in a higher order space, instead of in the original feature space. Moreover, the discriminative ability of some invariant features such as shared background information is weak, and needs to be further filtered. Therefore, in this paper, we propose a novel domain adaption algorithm based on an explicit feature map and feature selection. The data are first represented by a kernel-induced explicit feature map, such that high-order invariant features can be revealed. Then, by minimizing the marginal distribution difference, conditional distribution difference, and the model error, the invariant discriminative features are effectively selected. This problem is NP-hard to be solved, and we propose to relax it and solve it by a cutting plane algorithm. Experimental results on six real-world benchmarks have demonstrated the effectiveness and efficiency of the proposed algorithm, which outperforms many state-of-the-art domain adaption approaches

    Improving Outcomes in Machine Learning and Data-Driven Learning Systems using Structural Causal Models

    Get PDF
    The field of causal inference has experienced rapid growth and development in recent years. Its significance in addressing a diverse array of problems and its relevance across various research and application domains are increasingly being acknowledged. However, the current state-of-the-art approaches to causal inference have not yet gained widespread adoption in mainstream data science practices. This research endeavor begins by seeking to motivate enthusiasm for contemporary approaches to causal investigation utilizing observational data. It explores the existing applications and potential future prospects for employing causal inference methods to enhance desired outcomes in data-driven learning applications across various domains, with a particular focus on their relevance in artificial intelligence (AI). Following this motivation, this dissertation proceeds to offer a broad review of fundamental concepts, theoretical frameworks, methodological advancements, and existing techniques pertaining to causal inference. The research advances by investigating the problem of data-driven root cause analysis through the lens of causal structure modeling. Data-driven approaches to root cause analysis (RCA) have received attention recently due to their ability to exploit increasing data availability for more effective root cause identification in complex processes. Advancements in the field of causal inference enable unbiased causal investigations using observational data. This study proposes a data-driven RCA method and a time-to-event (TTE) data simulation procedure built on the structural causal model (SCM) framework. A novel causality-based method is introduced for learning a representation of root cause mechanisms, termed in this work as root cause graphs (RCGs), from observational TTE data. Three case scenarios are used to generate TTE datasets for evaluating the proposed method. The utility of the proposed RCG recovery method is demonstrated by using recovered RCGs to guide the estimation of root cause treatment effects. In the presence of mediation, RCG-guided models produce superior estimates of root cause total effects compared to models that adjust for all covariates. The author delves into the subject of integrating causal inference and machine learning. Incorporating causal inference into machine learning offers many benefits including enhancing model interpretability and robustness to changes in data distributions. This work considers the task of feature selection for prediction model development in the context of potentially changing environments. First, a filter feature selection approach that improves on the select k-best method and prioritizes causal features is introduced and compared to the standard select k-best algorithm. Secondly, a causal feature selection algorithm which adapts to covariate shifts in the target domain is proposed for domain adaptation. Causal approaches to feature selection are demonstrated to be capable of yielding optimal prediction performance when modeling assumptions are met. Additionally, they can mitigate the degrading effects of some forms of dataset shifts on prediction performance
    corecore