6 research outputs found

    Augmented Naive Bayesによる大規模ベイジアンネットワーク分類器学習

    Get PDF
    ベイジアンネットワーク分類器(Bayesian Network Classifier: BNC)は確率モデルによる分類器である.BNCの構造学習はこれまで候補構造から近似的に識別モデルの学習スコアが最大となる構造を探索する手法が用いられてきた.近年,菅原ら(2020)はベイジアンネットワークの学習スコアにより厳密学習したBNCは分類精度が低いとは限らないと報告している.更に彼らは,構造にAugmented Naive Bayes(ANB)を仮定し生成モデルとしてBNCを厳密学習する手法を提案した.これによりデータが少ない場合も分類精度の高いBNCを学習できることを示した.しかし,厳密学習は変数の増加に対して計算量が指数的に増加するため,菅原ら(2020)の手法は数十変数のANB学習が限界である.そこで,本論文では大規模変数をもつBNCを学習できる手法を提案する.因果モデルの研究分野では,条件付き独立性検定(CIテスト)とエッジの方向付けによる計算効率の高い構造学習法が提案されており,制約ベースアプローチと呼ぶ.名取らは,CIテストにBayes factorを用いることで真の構造への漸近一致性を有しつつ1000変数以上の構造学習を実現し,本田らはその手法に推移性を組み込むことで3500変数の構造学習を実現した.そこで本論文では,本田らの手法を用いて従来より大規模なANBを学習できる手法を提案し,提案手法がANB構造について漸近的にパラメータ数を最小にして真の同時確率分布に収束することを示す.実験により,大規模構造学習において提案手法が有用であることを示す.A Bayesian network classifier (BNC) is known as a probabilistic classifier for discrete variables. Previous research indicates the exact learning of BNC with the Augmented Naive Bayes (ANB) structure constraint outperforms the approximation methods. However, the exact learning cannot learn the huge BNC. On the other hand, there is a constraint-based algorithm that can learn 3500 nodes Bayesian networks by the RAI algorithm with the transitivity. This paper proposes a method that extends this algorithm for the ANB learning, and proves that the proposed method has an asymptotic consistency for ANB structures. The experimental results show that the proposed method outperforms the other methods for the huge BNC learning

    Bayesian Network Model Averaging Classifiers by Subbagging

    Get PDF
    When applied to classification problems, Bayesian networks are often used to infer a class variable when given feature variables. Earlier reports have described that the classification accuracy of Bayesian network structures achieved by maximizing the marginal likelihood (ML) is lower than that achieved by maximizing the conditional log likelihood (CLL) of a class variable given the feature variables. Nevertheless, because ML has asymptotic consistency, the performance of Bayesian network structures achieved by maximizing ML is not necessarily worse than that achieved by maximizing CLL for large data. However, the error of learning structures by maximizing the ML becomes much larger for small sample sizes. That large error degrades the classification accuracy. As a method to resolve this shortcoming, model averaging has been proposed to marginalize the class variable posterior over all structures. However, the posterior standard error of each structure in the model averaging becomes large as the sample size becomes small; it subsequently degrades the classification accuracy. The main idea of this study is to improve the classification accuracy using subbagging, which is modified bagging using random sampling without replacement, to reduce the posterior standard error of each structure in model averaging. Moreover, to guarantee asymptotic consistency, we use the K-best method with the ML score. The experimentally obtained results demonstrate that our proposed method provides more accurate classification than earlier BNC methods and the other state-of-the-art ensemble methods do