Search CORE

15,323 research outputs found

ANALYZING BIG DATA WITH DECISION TREES

Author: Leong Lok Kei
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2014
Field of study

ANALYZING BIG DATA WITH DECISION TREE

SJSU ScholarWorks

Optimal Sparse Decision Trees

Author: Hu Xiyang
Rudin Cynthia
Seltzer Margo
Publication venue
Publication date: 17/09/2020
Field of study

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. Our experiments highlight advantages in scalability, speed, and proof of optimality.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

arXiv.org e-Print Archive

Efficient Database Generation for Data-driven Security Assessment of Power Systems

Author: Chatzivasileiadis Spyros
Eriksson Robert
Thams Florian
Venzke Andreas
Publication venue
Publication date: 01/02/2019
Field of study

Power system security assessment methods require large datasets of operating points to train or test their performance. As historical data often contain limited number of abnormal situations, simulation data are necessary to accurately determine the security boundary. Generating such a database is an extremely demanding task, which becomes intractable even for small system sizes. This paper proposes a modular and highly scalable algorithm for computationally efficient database generation. Using convex relaxation techniques and complex network theory, we discard large infeasible regions and drastically reduce the search space. We explore the remaining space by a highly parallelizable algorithm and substantially decrease computation time. Our method accommodates numerous definitions of power system security. Here we focus on the combination of N-k security and small-signal stability. Demonstrating our algorithm on IEEE 14-bus and NESTA 162-bus systems, we show how it outperforms existing approaches requiring less than 10% of the time other methods require.Comment: Database publicly available at: https://github.com/johnnyDEDK/OPs_Nesta162Bus - Paper accepted for publication at IEEE Transactions on Power System

arXiv.org e-Print Archive

Online Research Database In Technology

Bidirectional branch and bound for controlled variable selection. Part III: local average loss minimization

Author: Cao Yi
Kariwala Vinay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

The selection of controlled variables (CVs) from available measurements through exhaustive search is computationally forbidding for large-scale processes. We have recently proposed novel bidirectional branch and bound (B-3) approaches for CV selection using the minimum singular value (MSV) rule and the local worst- case loss criterion in the framework of self-optimizing control. However, the MSV rule is approximate and worst-case scenario may not occur frequently in practice. Thus, CV selection by minimizing local average loss can be deemed as most reliable. In this work, the B-3 approach is extended to CV selection based on local average loss metric. Lower bounds on local average loss and, fast pruning and branching algorithms are derived for the efficient B-3 algorithm. Random matrices and binary distillation column case study are used to demonstrate the computational efficiency of the proposed method

Cranfield CERES

DR-NTU (Digital Repository of NTU)