66 research outputs found
RPA:Learning Interpretable Input-Output Relationships by Counting Samples
This work proposes a fast solution algorithm to a fundamental data science problem, namely to identify Boolean rules in disjunctive normal form (DNF) that classify samples based on binary features. The algorithm is an explainable machine learning method: it provides an explicit input-output relationship. It is based on hypothesis tests through confidence intervals, where the used test statistic requires nothing more than counting the number of cases and the number of controls that possess a certain feature or a set of features, reflecting the potential AND clauses of the Boolean phrase. Extensive experiments on simulated data demonstrate the algorithm’s effectivity and efficiency. The efficiency of the algorithm relies on the fact that the bottleneck operation is a matrix multiplication of the input matrix with itself. More than only a solution algorithm, this paper offers a flexible and transparent theoretical framework with a statistical analysis of the problem and many entry points for future adjustments and improvements. Among other things, this framework allows one to assess the feasibility of identifying the input-output relationships given certain easily-obtained characteristics of the data
RPA:Learning Interpretable Input-Output Relationships by Counting Samples
This work proposes a fast solution algorithm to a fundamental data science problem, namely to identify Boolean rules in disjunctive normal form (DNF) that classify samples based on binary features. The algorithm is an explainable machine learning method: it provides an explicit input-output relationship. It is based on hypothesis tests through confidence intervals, where the used test statistic requires nothing more than counting the number of cases and the number of controls that possess a certain feature or a set of features, reflecting the potential AND clauses of the Boolean phrase. Extensive experiments on simulated data demonstrate the algorithm’s effectivity and efficiency. The efficiency of the algorithm relies on the fact that the bottleneck operation is a matrix multiplication of the input matrix with itself. More than only a solution algorithm, this paper offers a flexible and transparent theoretical framework with a statistical analysis of the problem and many entry points for future adjustments and improvements. Among other things, this framework allows one to assess the feasibility of identifying the input-output relationships given certain easily-obtained characteristics of the data
A Renewed Take on Weighted Sum in Sandwich Algorithms:Modification of the Criterion Space
Sandwich algorithms are commonly used to approximate the Pareto front of a multiobjective (MO) convex problem by enclosing it between an inner and outer approximation. By iteratively improving the approximations, the distance between them is minimized which gives an estimate of how well the Pareto front is approximated. A wellexplainable type of sandwich algorithm is based on weighted sum scalarization (WSS), where the next set of weights is determined by the most promising inner normal of the inner approximation. As these normals can contain negative values, not every optimization will result in finding an efficient point. In order to reduce the number of searches towards the dominated part, we propose an elegant modification of the criterion space which is an advancement on the formulation of Solanki et al. In addition to being well-explainable and easy to integrate within an existing optimization procedure, this modification is theoretically able to obtain all nondominated points of an MO linear programming problem in a finite number of expansions of the inner approximation. Furthermore, we propose two heuristic approaches to determine the distance between the inner and outer approximation that can be used as an alternative for the distance calculation of Solanki et al. These heuristics incorporate the ideas of Solanki et al. And Craft et al. to obtain straightforward and faster methods
A Renewed Take on Weighted Sum in Sandwich Algorithms:Modification of the Criterion Space
Sandwich algorithms are commonly used to approximate the Pareto front of a multiobjective (MO) convex problem by enclosing it between an inner and outer approximation. By iteratively improving the approximations, the distance between them is minimized which gives an estimate of how well the Pareto front is approximated. A wellexplainable type of sandwich algorithm is based on weighted sum scalarization (WSS), where the next set of weights is determined by the most promising inner normal of the inner approximation. As these normals can contain negative values, not every optimization will result in finding an efficient point. In order to reduce the number of searches towards the dominated part, we propose an elegant modification of the criterion space which is an advancement on the formulation of Solanki et al. In addition to being well-explainable and easy to integrate within an existing optimization procedure, this modification is theoretically able to obtain all nondominated points of an MO linear programming problem in a finite number of expansions of the inner approximation. Furthermore, we propose two heuristic approaches to determine the distance between the inner and outer approximation that can be used as an alternative for the distance calculation of Solanki et al. These heuristics incorporate the ideas of Solanki et al. And Craft et al. to obtain straightforward and faster methods
Bi-objective goal programming for balancing costs vs. nutritional adequacy
IntroductionLinear programming (LP) is often used within diet optimization to find, from a set of available food commodities, the most affordable diet that meets the nutritional requirements of an individual or (sub)population. It is, however, not always possible to create a feasible diet, as certain nutritional requirements are difficult to meet. In that case, goal programming (GP) can be used to minimize deviations from the nutritional requirements in order to obtain a near feasible diet. With GP the cost of the diet is often overlooked or taken into account using the ε-constraint method. This method does not guarantee to find all possible trade-offs between costs and nutritional deficiency without solving many uninformative LPs.MethodsWe present a method to find all trade-offs between any two linear objectives in a dietary LP context that is simple, does not solve uninformative LPs and does not need prior input from the decision maker (DM). This method is a bi-objective algorithm based on the NonInferior Set Estimation (NISE) method that finds all efficient trade-offs between two linear objectives.ResultsIn order to show what type of insights can be gained from this approach, two analyses are presented that investigate the relation between cost and nutritional adequacy. In the first analysis a diet with a restriction on the exact energy intake is considered where all nutrient intakes except energy are allowed to deviate from their prescription. This analysis is especially helpful in case of a restrictive budget or when a nutritionally adequate diet is either unaffordable or unattainable. The second analysis only relaxes the exact energy intake, where the other nutrients are kept within their requirements, to investigate how the energy intake affects the cost of a diet. Here, we describe in what situations the so-called more-for-less paradox takes place, which can be induced by requiring an exact energy intake.ConclusionTo the best of our knowledge, we are the first to address how to obtain all efficient trade-offs of two linear objectives in a dietary LP context and how this can be used for analyses
- …