5,285 research outputs found
Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded
Decision trees usefully represent sparse, high dimensional and noisy data.
Having learned a function from this data, we may want to thereafter integrate
the function into a larger decision-making problem, e.g., for picking the best
chemical process catalyst. We study a large-scale, industrially-relevant
mixed-integer nonlinear nonconvex optimization problem involving both
gradient-boosted trees and penalty functions mitigating risk. This
mixed-integer optimization problem with convex penalty terms broadly applies to
optimizing pre-trained regression tree models. Decision makers may wish to
optimize discrete models to repurpose legacy predictive models, or they may
wish to optimize a discrete model that particularly well-represents a data set.
We develop several heuristic methods to find feasible solutions, and an exact,
branch-and-bound algorithm leveraging structural properties of the
gradient-boosted trees and penalty functions. We computationally test our
methods on concrete mixture design instance and a chemical catalysis industrial
instance
Applications of Machine Learning to Threat Intelligence, Intrusion Detection and Malware
Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or reduce analyst workload, it also introduces new attack surfaces
Fast Supervised Hashing with Decision Trees for High-Dimensional Data
Supervised hashing aims to map the original features to compact binary codes
that are able to preserve label based similarity in the Hamming space.
Non-linear hash functions have demonstrated the advantage over linear ones due
to their powerful generalization capability. In the literature, kernel
functions are typically used to achieve non-linearity in hashing, which achieve
encouraging retrieval performance at the price of slow evaluation and training
time. Here we propose to use boosted decision trees for achieving non-linearity
in hashing, which are fast to train and evaluate, hence more suitable for
hashing with high dimensional data. In our approach, we first propose
sub-modular formulations for the hashing binary code inference problem and an
efficient GraphCut based block search method for solving large-scale inference.
Then we learn hash functions by training boosted decision trees to fit the
binary codes. Experiments demonstrate that our proposed method significantly
outperforms most state-of-the-art methods in retrieval precision and training
time. Especially for high-dimensional data, our method is orders of magnitude
faster than many methods in terms of training time.Comment: Appearing in Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2014, Ohio, US
- …