15,323 research outputs found

    ANALYZING BIG DATA WITH DECISION TREES

    Get PDF
    ANALYZING BIG DATA WITH DECISION TREE

    Optimal Sparse Decision Trees

    Full text link
    Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. Our experiments highlight advantages in scalability, speed, and proof of optimality.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

    Efficient Database Generation for Data-driven Security Assessment of Power Systems

    Full text link
    Power system security assessment methods require large datasets of operating points to train or test their performance. As historical data often contain limited number of abnormal situations, simulation data are necessary to accurately determine the security boundary. Generating such a database is an extremely demanding task, which becomes intractable even for small system sizes. This paper proposes a modular and highly scalable algorithm for computationally efficient database generation. Using convex relaxation techniques and complex network theory, we discard large infeasible regions and drastically reduce the search space. We explore the remaining space by a highly parallelizable algorithm and substantially decrease computation time. Our method accommodates numerous definitions of power system security. Here we focus on the combination of N-k security and small-signal stability. Demonstrating our algorithm on IEEE 14-bus and NESTA 162-bus systems, we show how it outperforms existing approaches requiring less than 10% of the time other methods require.Comment: Database publicly available at: https://github.com/johnnyDEDK/OPs_Nesta162Bus - Paper accepted for publication at IEEE Transactions on Power System

    Bidirectional branch and bound for controlled variable selection. Part III: local average loss minimization

    Get PDF
    The selection of controlled variables (CVs) from available measurements through exhaustive search is computationally forbidding for large-scale processes. We have recently proposed novel bidirectional branch and bound (B-3) approaches for CV selection using the minimum singular value (MSV) rule and the local worst- case loss criterion in the framework of self-optimizing control. However, the MSV rule is approximate and worst-case scenario may not occur frequently in practice. Thus, CV selection by minimizing local average loss can be deemed as most reliable. In this work, the B-3 approach is extended to CV selection based on local average loss metric. Lower bounds on local average loss and, fast pruning and branching algorithms are derived for the efficient B-3 algorithm. Random matrices and binary distillation column case study are used to demonstrate the computational efficiency of the proposed method
    • …
    corecore