6 research outputs found
Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition
There has been recent interest in the use of machine learning (ML) approaches
within mathematical software to make choices that impact on the computing
performance without affecting the mathematical correctness of the result. We
address the problem of selecting the variable ordering for cylindrical
algebraic decomposition (CAD), an important algorithm in Symbolic Computation.
Prior work to apply ML on this problem implemented a Support Vector Machine
(SVM) to select between three existing human-made heuristics, which did better
than anyone heuristic alone. The present work extends to have ML select the
variable ordering directly, and to try a wider variety of ML techniques.
We experimented with the NLSAT dataset and the Regular Chains Library CAD
function for Maple 2018. For each problem, the variable ordering leading to the
shortest computing time was selected as the target class for ML. Features were
generated from the polynomial input and used to train the following ML models:
k-nearest neighbours (KNN) classifier, multi-layer perceptron (MLP), decision
tree (DT) and SVM, as implemented in the Python scikit-learn package. We also
compared these with the two leading human constructed heuristics for the
problem: Brown's heuristic and sotd. On this dataset all of the ML approaches
outperformed the human made heuristics, some by a large margin.Comment: Accepted into CICM 201
A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs
We are interested in the application of Machine Learning (ML) technology to
improve mathematical software. It may seem that the probabilistic nature of ML
tools would invalidate the exact results prized by such software, however, the
algorithms which underpin the software often come with a range of choices which
are good candidates for ML application. We refer to choices which have no
effect on the mathematical correctness of the software, but do impact its
performance.
In the past we experimented with one such choice: the variable ordering to
use when building a Cylindrical Algebraic Decomposition (CAD). We used the
Python library Scikit-Learn (sklearn) to experiment with different ML models,
and developed new techniques for feature generation and hyper-parameter
selection.
These techniques could easily be adapted for making decisions other than our
immediate application of CAD variable ordering. Hence in this paper we present
a software pipeline to use sklearn to pick the variable ordering for an
algorithm that acts on a polynomial system. The code described is freely
available online.Comment: Accepted into Proc ICMS 202
Algorithmically generating new algebraic features of polynomial systems for machine learning
There are a variety of choices to be made in both computer algebra systems
(CASs) and satisfiability modulo theory (SMT) solvers which can impact
performance without affecting mathematical correctness. Such choices are
candidates for machine learning (ML) approaches, however, there are
difficulties in applying standard ML techniques, such as the efficient
identification of ML features from input data which is typically a polynomial
system. Our focus is selecting the variable ordering for cylindrical algebraic
decomposition (CAD), an important algorithm implemented in several CASs, and
now also SMT-solvers. We created a framework to describe all the previously
identified ML features for the problem and then enumerated all options in this
framework to automatically generation many more features. We validate the
usefulness of these with an experiment which shows that an ML choice for CAD
variable ordering is superior to those made by human created heuristics, and
further improved with these additional features. We expect that this technique
of feature generation could be useful for other choices related to CAD, or even
choices for other algorithms with polynomial systems for input.Comment: To appear in Proc SC-Square Workshop 2019. arXiv admin note:
substantial text overlap with arXiv:1904.1106
Integrating linear ordinary fourthorder differential equations in the MAPLE programming environment
This paper reports a method to solve ordinary fourth-order differential equations in the form of ordinary power series and, for the case of regular special points, in the form of generalized power series. An algorithm has been constructed and a program has been developed in the MAPLE environment (Waterloo, Ontario, Canada) in order to solve the fourth-order differential equation
Deciding the consistency of non-linear real arithmetic constraints with a conflict driven search using cylindrical algebraic coverings
We present a new algorithm for determining the satisfiability of conjunctions
of non-linear polynomial constraints over the reals, which can be used as a
theory solver for satisfiability modulo theory (SMT) solving for non-linear
real arithmetic. The algorithm is a variant of Cylindrical Algebraic
Decomposition (CAD) adapted for satisfiability, where solution candidates
(sample points) are constructed incrementally, either until a satisfying sample
is found or sufficient samples have been sampled to conclude unsatisfiability.
The choice of samples is guided by the input constraints and previous
conflicts.
The key idea behind our new approach is to start with a partial sample;
demonstrate that it cannot be extended to a full sample; and from the reasons
for that rule out a larger space around the partial sample, which build up
incrementally into a cylindrical algebraic covering of the space. There are
similarities with the incremental variant of CAD, the NLSAT method of Jovanovic
and de Moura, and the NuCAD algorithm of Brown; but we present worked examples
and experimental results on a preliminary implementation to demonstrate the
differences to these, and the benefits of the new approach
Software supporting the paper: "Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition"
This toolbox supports the results in the following publication: M. England and D. Florescu. Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition. To appear in Proc CICM 2019, Springer LNCS, 2019. Preprint: Arxiv:1904.11061 The authors are supported by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifer Elimination Procedures. This toolbox requires the NLSAT database, which can be downloaded at https://cs.nyu.edu/~dejan/nonlinear/. The first file in this toolbox was run in Matlab R2018b. The CAD routine was run in Maple 2018, with an updated version of the RegularChains Library downloaded in February 2019 from http://www.regularchains.org. This updated library contains bug fixes and additional functionality. The training and evaluation of the machine learning models was done using the scikit-learn package v0.20.2 for Python 2.7. I. << Load_NLSAT.m >> converts the redlog files corresponding to all problems with 3 variables to .txt files NLSAT database =>> \poly\poly1.txt - \poly\poly7200.txt II. > Input: > Output: > # observation: comp_times is the folder with the computation times for all problems, which are further on split into training and testing Description: - for each problem in \poly, the script copies the corresponding file to \poly\pol.txt and calls the following Maple script > Input: > Output: > - for each problem, the time limit starts from 4s and doubles if all of the orderings timed out - other files used/generated by the script: > # pre generated file in pickle format containing a fixed randomised order of the 7200 file names in poly\; is used only for generating the testing data; III.> Input: > Output: > Description: - for each problem in poly_test, the script copies the corresponding file to \poly_test\pol.txt and calls the following Maple script > Input: > Output: > - for each problem, the time limit is fixed at 128s. IV. > Input: > Output: > Description: - generates the input\output testing data for ML - generates the formula_numbers_test.txt, which is later used to work out the training data - other files used by the script: - all formula numbers from poly_test are written in > to be used later for ML V. > Input: > Output: > Description: - generates the Brown heuristic predictions on the testing data. VI. > Input: > Output: > Description: - generates the sotd predictions on the testing dataset - observation: the numbers in formula_numbers_sotd.txt are in a different order than in formula_numbers_test VII.> Input: > Output: > Description: - generates the input\output data for ML (the heuristics have nothing to do with this step) - formula_numbers_train.txt includes {1,...,7200}\formula_numbers_test.txt VIII.> > Input: > Output: > Description: - generates ML predictions on the testing dataset for: DT, MLP, SVC and KNN Steps for training a model: - the ranges for performing CV are stored in a dictionary coded in models_dictionary.py, in the part corresponding to model_class_name - after the ranges are defined, the model is selected in ML_training_and_prediction.py, e.g., model_class_name='DecisionTreeClassifier' - if they want to train the model, the user should select the choice to perform grid search CV on the training dataset - the best parameters returned by grid search should be manually entered in models_dictionary using the 'def' key of the dictionary - the ranges should be reset and the process repeated until the user is confident that the CV performance could not be improved by extending the range or using a finer resolution - after training, 'ML_training_and_prediction.py' should be run again. This time the user should skip training and select 'y' when asked 'Continue with the previous default parameters from "models_dictionary.py"?' - this will generate the model predictions and save them in 'y_'+model_class_name+'_test.txt' IX. > Input: > => time_ ...(all methods) & accuracy ...(all methods) Output: Prints the performance of each method Description: - computes the accuracy and total time using the final prediction on the testing dataset with all methods - plots a histogram showing the percentage increase in computing time with every method, compared to the minimum tim