136 research outputs found

    RPA:Learning Interpretable Input-Output Relationships by Counting Samples

    Get PDF
    This work proposes a fast solution algorithm to a fundamental data science problem, namely to identify Boolean rules in disjunctive normal form (DNF) that classify samples based on binary features. The algorithm is an explainable machine learning method: it provides an explicit input-output relationship. It is based on hypothesis tests through confidence intervals, where the used test statistic requires nothing more than counting the number of cases and the number of controls that possess a certain feature or a set of features, reflecting the potential AND clauses of the Boolean phrase. Extensive experiments on simulated data demonstrate the algorithm’s effectivity and efficiency. The efficiency of the algorithm relies on the fact that the bottleneck operation is a matrix multiplication of the input matrix with itself. More than only a solution algorithm, this paper offers a flexible and transparent theoretical framework with a statistical analysis of the problem and many entry points for future adjustments and improvements. Among other things, this framework allows one to assess the feasibility of identifying the input-output relationships given certain easily-obtained characteristics of the data
    • …
    corecore