6 research outputs found

    Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition

    Get PDF
    There has been recent interest in the use of machine learning (ML) approaches within mathematical software to make choices that impact on the computing performance without affecting the mathematical correctness of the result. We address the problem of selecting the variable ordering for cylindrical algebraic decomposition (CAD), an important algorithm in Symbolic Computation. Prior work to apply ML on this problem implemented a Support Vector Machine (SVM) to select between three existing human-made heuristics, which did better than anyone heuristic alone. The present work extends to have ML select the variable ordering directly, and to try a wider variety of ML techniques. We experimented with the NLSAT dataset and the Regular Chains Library CAD function for Maple 2018. For each problem, the variable ordering leading to the shortest computing time was selected as the target class for ML. Features were generated from the polynomial input and used to train the following ML models: k-nearest neighbours (KNN) classifier, multi-layer perceptron (MLP), decision tree (DT) and SVM, as implemented in the Python scikit-learn package. We also compared these with the two leading human constructed heuristics for the problem: Brown's heuristic and sotd. On this dataset all of the ML approaches outperformed the human made heuristics, some by a large margin.Comment: Accepted into CICM 201

    A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs

    Get PDF
    We are interested in the application of Machine Learning (ML) technology to improve mathematical software. It may seem that the probabilistic nature of ML tools would invalidate the exact results prized by such software, however, the algorithms which underpin the software often come with a range of choices which are good candidates for ML application. We refer to choices which have no effect on the mathematical correctness of the software, but do impact its performance. In the past we experimented with one such choice: the variable ordering to use when building a Cylindrical Algebraic Decomposition (CAD). We used the Python library Scikit-Learn (sklearn) to experiment with different ML models, and developed new techniques for feature generation and hyper-parameter selection. These techniques could easily be adapted for making decisions other than our immediate application of CAD variable ordering. Hence in this paper we present a software pipeline to use sklearn to pick the variable ordering for an algorithm that acts on a polynomial system. The code described is freely available online.Comment: Accepted into Proc ICMS 202

    Algorithmically generating new algebraic features of polynomial systems for machine learning

    Get PDF
    There are a variety of choices to be made in both computer algebra systems (CASs) and satisfiability modulo theory (SMT) solvers which can impact performance without affecting mathematical correctness. Such choices are candidates for machine learning (ML) approaches, however, there are difficulties in applying standard ML techniques, such as the efficient identification of ML features from input data which is typically a polynomial system. Our focus is selecting the variable ordering for cylindrical algebraic decomposition (CAD), an important algorithm implemented in several CASs, and now also SMT-solvers. We created a framework to describe all the previously identified ML features for the problem and then enumerated all options in this framework to automatically generation many more features. We validate the usefulness of these with an experiment which shows that an ML choice for CAD variable ordering is superior to those made by human created heuristics, and further improved with these additional features. We expect that this technique of feature generation could be useful for other choices related to CAD, or even choices for other algorithms with polynomial systems for input.Comment: To appear in Proc SC-Square Workshop 2019. arXiv admin note: substantial text overlap with arXiv:1904.1106

    Integrating linear ordinary fourthorder differential equations in the MAPLE programming environment

    Get PDF
    This paper reports a method to solve ordinary fourth-order differential equations in the form of ordinary power series and, for the case of regular special points, in the form of generalized power series. An algorithm has been constructed and a program has been developed in the MAPLE environment (Waterloo, Ontario, Canada) in order to solve the fourth-order differential equation

    Deciding the consistency of non-linear real arithmetic constraints with a conflict driven search using cylindrical algebraic coverings

    Get PDF
    We present a new algorithm for determining the satisfiability of conjunctions of non-linear polynomial constraints over the reals, which can be used as a theory solver for satisfiability modulo theory (SMT) solving for non-linear real arithmetic. The algorithm is a variant of Cylindrical Algebraic Decomposition (CAD) adapted for satisfiability, where solution candidates (sample points) are constructed incrementally, either until a satisfying sample is found or sufficient samples have been sampled to conclude unsatisfiability. The choice of samples is guided by the input constraints and previous conflicts. The key idea behind our new approach is to start with a partial sample; demonstrate that it cannot be extended to a full sample; and from the reasons for that rule out a larger space around the partial sample, which build up incrementally into a cylindrical algebraic covering of the space. There are similarities with the incremental variant of CAD, the NLSAT method of Jovanovic and de Moura, and the NuCAD algorithm of Brown; but we present worked examples and experimental results on a preliminary implementation to demonstrate the differences to these, and the benefits of the new approach

    Software supporting the paper: "Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition"

    No full text
    This toolbox supports the results in the following publication: M. England and D. Florescu. Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition. To appear in Proc CICM 2019, Springer LNCS, 2019. Preprint: Arxiv:1904.11061 The authors are supported by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifer Elimination Procedures. This toolbox requires the NLSAT database, which can be downloaded at https://cs.nyu.edu/~dejan/nonlinear/. The first file in this toolbox was run in Matlab R2018b. The CAD routine was run in Maple 2018, with an updated version of the RegularChains Library downloaded in February 2019 from http://www.regularchains.org. This updated library contains bug fixes and additional functionality. The training and evaluation of the machine learning models was done using the scikit-learn package v0.20.2 for Python 2.7. I. << Load_NLSAT.m >> converts the redlog files corresponding to all problems with 3 variables to .txt files NLSAT database =>> \poly\poly1.txt - \poly\poly7200.txt II. > Input: > Output: > # observation: comp_times is the folder with the computation times for all problems, which are further on split into training and testing Description: - for each problem in \poly, the script copies the corresponding file to \poly\pol.txt and calls the following Maple script > Input: > Output: > - for each problem, the time limit starts from 4s and doubles if all of the orderings timed out - other files used/generated by the script: > # pre generated file in pickle format containing a fixed randomised order of the 7200 file names in poly\; is used only for generating the testing data; III.> Input: > Output: > Description: - for each problem in poly_test, the script copies the corresponding file to \poly_test\pol.txt and calls the following Maple script > Input: > Output: > - for each problem, the time limit is fixed at 128s. IV. > Input: > Output: > Description: - generates the input\output testing data for ML - generates the formula_numbers_test.txt, which is later used to work out the training data - other files used by the script: - all formula numbers from poly_test are written in > to be used later for ML V. > Input: > Output: > Description: - generates the Brown heuristic predictions on the testing data. VI. > Input: > Output: > Description: - generates the sotd predictions on the testing dataset - observation: the numbers in formula_numbers_sotd.txt are in a different order than in formula_numbers_test VII.> Input: > Output: > Description: - generates the input\output data for ML (the heuristics have nothing to do with this step) - formula_numbers_train.txt includes {1,...,7200}\formula_numbers_test.txt VIII.> > Input: > Output: > Description: - generates ML predictions on the testing dataset for: DT, MLP, SVC and KNN Steps for training a model: - the ranges for performing CV are stored in a dictionary coded in models_dictionary.py, in the part corresponding to model_class_name - after the ranges are defined, the model is selected in ML_training_and_prediction.py, e.g., model_class_name='DecisionTreeClassifier' - if they want to train the model, the user should select the choice to perform grid search CV on the training dataset - the best parameters returned by grid search should be manually entered in models_dictionary using the 'def' key of the dictionary - the ranges should be reset and the process repeated until the user is confident that the CV performance could not be improved by extending the range or using a finer resolution - after training, 'ML_training_and_prediction.py' should be run again. This time the user should skip training and select 'y' when asked 'Continue with the previous default parameters from "models_dictionary.py"?' - this will generate the model predictions and save them in 'y_'+model_class_name+'_test.txt' IX. > Input: > => time_ ...(all methods) & accuracy ...(all methods) Output: Prints the performance of each method Description: - computes the accuracy and total time using the final prediction on the testing dataset with all methods - plots a histogram showing the percentage increase in computing time with every method, compared to the minimum tim
    corecore