95,785 research outputs found

    Improved Boosted Decision Tree Algorithms by Adaptive Apriori and Post-Pruning for Predicting Obstructive Sleep Apnea

    Get PDF
    The improved version of Boosted Decision Tree algorithm, named as Boosted Adaptive Apriori post-Pruned Decision Tree (Boosted AApoP-DT), was developed by referring to Adaptive Apriori (AA) properties and by using post-pruning technique. The post-pruning technique used is mainly the error-complexity pruning for the decision trees categorized under Classification and Regression Trees. This technique estimates the re-substitution, cross-validation and generalization error rates before and after the post-pruning. The novelty of the post-pruning technique applied is that it is augmented by AA properties and these depend on the data characteristics in the dataset(s) being accessed. This algorithm is then boosted by using AdaBoost ensemble method. After comparing and contrasting this developed algorithm with the algorithm without being augmented by AA, i.e., Boosted post-Pruned Decision Tree (Boosted poP-DT), and the classical boosted decision tree algorithm, i.e., Boosted DT, there is a stepwise improvement shown when comparison proceeds from Boosted DT to Boosted poP-DT and to Boosted AApoP-DT

    Stratification Trees for Adaptive Randomization in Randomized Controlled Trials

    Full text link
    This paper proposes an adaptive randomization procedure for two-stage randomized controlled trials. The method uses data from a first-wave experiment in order to determine how to stratify in a second wave of the experiment, where the objective is to minimize the variance of an estimator for the average treatment effect (ATE). We consider selection from a class of stratified randomization procedures which we call stratification trees: these are procedures whose strata can be represented as decision trees, with differing treatment assignment probabilities across strata. By using the first wave to estimate a stratification tree, we simultaneously select which covariates to use for stratification, how to stratify over these covariates, as well as the assignment probabilities within these strata. Our main result shows that using this randomization procedure with an appropriate estimator results in an asymptotic variance which is minimal in the class of stratification trees. Moreover, the results we present are able to accommodate a large class of assignment mechanisms within strata, including stratified block randomization. In a simulation study, we find that our method, paired with an appropriate cross-validation procedure ,can improve on ad-hoc choices of stratification. We conclude by applying our method to the study in Karlan and Wood (2017), where we estimate stratification trees using the first wave of their experiment

    Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach

    Get PDF
    While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune different graph operations on a specific GPU. However, the performance impact of the input graph has only been taken into account indirectly as a result of the graphs used to benchmark the system. In this work, we present a case study investigating how to use the properties of the input graph to improve the performance of the breadth-first search (BFS) graph traversal. To do so, we first study the performance variation of 15 different BFS implementations across 248 graphs. Using this performance data, we show that significant speed-up can be achieved by combining the best implementation for each level of the traversal. To make use of this data-dependent optimization, we must correctly predict the relative performance of algorithms per graph level, and enable dynamic switching to the optimal algorithm for each level at runtime. We use the collected performance data to train a binary decision tree, to enable high-accuracy predictions and fast switching. We demonstrate empirically that our decision tree is both fast enough to allow dynamic switching between implementations, without noticeable overhead, and accurate enough in its prediction to enable significant BFS speedup. We conclude that our model-driven approach (1) enables BFS to outperform state of the art GPU algorithms, and (2) can be adapted for other BFS variants, other algorithms, or more specific datasets

    Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover

    Full text link
    Stochastic Boolean Function Evaluation is the problem of determining the value of a given Boolean function f on an unknown input x, when each bit of x_i of x can only be determined by paying an associated cost c_i. The assumption is that x is drawn from a given product distribution, and the goal is to minimize the expected cost. This problem has been studied in Operations Research, where it is known as "sequential testing" of Boolean functions. It has also been studied in learning theory in the context of learning with attribute costs. We consider the general problem of developing approximation algorithms for Stochastic Boolean Function Evaluation. We give a 3-approximation algorithm for evaluating Boolean linear threshold formulas. We also present an approximation algorithm for evaluating CDNF formulas (and decision trees) achieving a factor of O(log kd), where k is the number of terms in the DNF formula, and d is the number of clauses in the CNF formula. In addition, we present approximation algorithms for simultaneous evaluation of linear threshold functions, and for ranking of linear functions. Our function evaluation algorithms are based on reductions to the Stochastic Submodular Set Cover (SSSC) problem. This problem was introduced by Golovin and Krause. They presented an approximation algorithm for the problem, called Adaptive Greedy. Our main technical contribution is a new approximation algorithm for the SSSC problem, which we call Adaptive Dual Greedy. It is an extension of the Dual Greedy algorithm for Submodular Set Cover due to Fujito, which is a generalization of Hochbaum's algorithm for the classical Set Cover Problem. We also give a new bound on the approximation achieved by the Adaptive Greedy algorithm of Golovin and Krause

    Automated user modeling for personalized digital libraries

    Get PDF
    Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information
    • …
    corecore