186,897 research outputs found
An Evolutionary Optimization Algorithm for Automated Classical Machine Learning
Machine learning is an evolving branch of computational algorithms that allow computers to learn from experiences, make predictions, and solve different problems without being explicitly programmed. However, building a useful machine learning model is a challenging process, requiring human expertise to perform various proper tasks and ensure that the machine learning\u27s primary objective --determining the best and most predictive model-- is achieved. These tasks include pre-processing, feature selection, and model selection. Many machine learning models developed by experts are designed manually and by trial and error. In other words, even experts need the time and resources to create good predictive machine learning models. The idea of automated machine learning (AutoML) is to automate a machine learning pipeline to release the burden of substantial development costs and manual processes. The algorithms leveraged in these systems have different hyper-parameters. On the other hand, different input datasets have various features. In both cases, the final performance of the model is closely related to the final selected configuration of features and hyper-parameters. That is why they are considered as crucial tasks in the AutoML. The challenges regarding the computationally expensive nature of tuning hyper-parameters and optimally selecting features create significant opportunities for filling the research gaps in the AutoML field. This dissertation explores how to select the features and tune the hyper-parameters of conventional machine learning algorithms efficiently and automatically. To address the challenges in the AutoML area, novel algorithms for hyper-parameter tuning and feature selection are proposed. The hyper-parameter tuning algorithm aims to provide the optimal set of hyper-parameters in three conventional machine learning models (Random Forest, XGBoost and Support Vector Machine) to obtain best scores regarding performance. On the other hand, the feature selection algorithm looks for the optimal subset of features to achieve the highest performance. Afterward, a hybrid framework is designed for both hyper-parameter tuning and feature selection. The proposed framework can discover close to the optimal configuration of features and hyper-parameters. The proposed framework includes the following components: (1) an automatic feature selection component based on artificial bee colony algorithms and machine learning training, and (2) an automatic hyper-parameter tuning component based on artificial bee colony algorithms and machine learning training for faster training and convergence of the learning models. The whole framework has been evaluated using four real-world datasets in different applications. This framework is an attempt to alleviate the challenges of hyper-parameter tuning and feature selection by using efficient algorithms. However, distributed processing, distributed learning, parallel computing, and other big data solutions are not taken into consideration in this framework
Automatic Debiased Machine Learning of Causal and Structural Effects
Many causal and structural effects depend on regressions. Examples include
average treatment effects, policy effects, average derivatives, regression
decompositions, economic average equivalent variation, and parameters of
economic structural models. The regressions may be high dimensional. Plugging
machine learners into identifying equations can lead to poor inference due to
bias and/or model selection. This paper gives automatic debiasing for
estimating equations and valid asymptotic inference for the estimators of
effects of interest. The debiasing is automatic in that its construction uses
the identifying equations without the full form of the bias correction and is
performed by machine learning. Novel results include convergence rates for
Lasso and Dantzig learners of the bias correction, primitive conditions for
asymptotic inference for important examples, and general conditions for GMM. A
variety of regression learners and identifying equations are covered. Automatic
debiased machine learning (Auto-DML) is applied to estimating the average
treatment effect on the treated for the NSW job training data and to estimating
demand elasticities from Nielsen scanner data while allowing preferences to be
correlated with prices and income
Anomaly Detection Based on Aggregation of Indicators
Automatic anomaly detection is a major issue in various areas. Beyond mere
detection, the identification of the origin of the problem that produced the
anomaly is also essential. This paper introduces a general methodology that can
assist human operators who aim at classifying monitoring signals. The main idea
is to leverage expert knowledge by generating a very large number of
indicators. A feature selection method is used to keep only the most
discriminant indicators which are used as inputs of a Naive Bayes classifier.
The parameters of the classifier have been optimized indirectly by the
selection process. Simulated data designed to reproduce some of the anomaly
types observed in real world engines.Comment: 23rd annual Belgian-Dutch Conference on Machine Learning (Benelearn
2014), Bruxelles : Belgium (2014
GUIDER: a GUI for semiautomatic, physiologically driven EEG feature selection for a rehabilitation BCI
GUIDER is a graphical user interface developed in MATLAB software environment to identify electroencephalography (EEG)-based brain computer interface (BCI) control features for a rehabilitation application (i.e. post-stroke motor imagery training). In this context, GUIDER aims to combine physiological and machine learning approaches. Indeed, GUIDER allows therapists to set parameters and constraints according to the rehabilitation principles (e.g. affected hemisphere, sensorimotor relevant frequencies) and foresees an automatic method to select the features among the defined subset. As a proof of concept, we compared offline performances between manual, just based on operator’s expertise and experience, and GUIDER semiautomatic features selection on BCI data collected from stroke patients during BCI-supported motor imagery training. Preliminary results suggest that this semiautomatic approach could be successfully applied to support the human selection reducing operator dependent variability in view of future multi-centric clinical trials
Kernel learning at the first level of inference
Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e.parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense
Which Surrogate Works for Empirical Performance Modelling? A Case Study with Differential Evolution
It is not uncommon that meta-heuristic algorithms contain some intrinsic
parameters, the optimal configuration of which is crucial for achieving their
peak performance. However, evaluating the effectiveness of a configuration is
expensive, as it involves many costly runs of the target algorithm. Perhaps
surprisingly, it is possible to build a cheap-to-evaluate surrogate that models
the algorithm's empirical performance as a function of its parameters. Such
surrogates constitute an important building block for understanding algorithm
performance, algorithm portfolio/selection, and the automatic algorithm
configuration. In principle, many off-the-shelf machine learning techniques can
be used to build surrogates. In this paper, we take the differential evolution
(DE) as the baseline algorithm for proof-of-concept study. Regression models
are trained to model the DE's empirical performance given a parameter
configuration. In particular, we evaluate and compare four popular regression
algorithms both in terms of how well they predict the empirical performance
with respect to a particular parameter configuration, and also how well they
approximate the parameter versus the empirical performance landscapes
Over-Fitting in Model Selection with Gaussian Process Regression
Model selection in Gaussian Process Regression (GPR) seeks to determine the optimal values of the hyper-parameters governing the covariance function, which allows flexible customization of the GP to the problem at hand. An oft-overlooked issue that is often encountered in the model process is over-fitting the model selection criterion, typically the marginal likelihood. The over-fitting in machine learning refers to the fitting of random noise present in the model selection criterion in addition to features improving the generalisation performance of the statistical model. In this paper, we construct several Gaussian process regression models for a range of high-dimensional datasets from the UCI machine learning repository. Afterwards, we compare both MSE on the test dataset and the negative log marginal likelihood (nlZ), used as the model selection criteria, to find whether the problem of overfitting in model selection also affects GPR. We found that the squared exponential covariance function with Automatic Relevance Determination (SEard) is better than other kernels including squared exponential covariance function with isotropic distance measure (SEiso) according to the nLZ, but it is clearly not the best according to MSE on the test data, and this is an indication of over-fitting problem in model selection
Meta-level learning for the effective reduction of model search space.
The exponential growth of volume, variety and velocity of the data is raising the need for investigation of intelligent ways to extract useful patterns from the data. It requires deep expert knowledge and extensive computational resources to find the mapping of learning methods that leads to the optimized performance on a given task. Moreover, numerous configurations of these learning algorithms add another level of complexity. Thus, it triggers the need for an intelligent recommendation engine that can advise the best learning algorithm and its configurations for a given task. The techniques that are commonly used by experts are; trial-and-error, use their prior experience on the specific domain, etc. These techniques sometimes work for less complex tasks that require thousands of parameters to learn. However, the state-of-the-art models, e.g. deep learning models, require well-tuned hyper-parameters to learn millions of parameters which demand specialized skills and numerous computationally expensive and time-consuming trials. In that scenario, Meta-level learning can be a potential solution that can recommend the most appropriate options efficiently and effectively regardless of the complexity of data. On the contrary, Meta-learning leads to several challenges; the most critical ones being model selection and hyper-parameter optimization. The goal of this research is to investigate model selection and hyper-parameter optimization approaches of automatic machine learning in general and the challenges associated with them. In machine learning pipeline there are several phases where Meta-learning can be used to effectively facilitate the best recommendations including 1) pre-processing steps, 2) learning algorithm or their combination, 3) adaptivity mechanism parameters, 4) recurring concept extraction, and 5) concept drift detection. The scope of this research is limited to feature engineering for problem representation, and learning strategy for algorithm and its hyper-parameters recommendation at Meta-level. There are three studies conducted around the two different approaches of automatic machine learning which are model selection using Meta-learning and hyper-parameter optimization. The first study evaluates the situation in which the use of additional data from a different domain can improve the performance of a meta-learning system for time-series forecasting, with focus on cross- domain Meta-knowledge transfer. Although the experiments revealed limited room for improvement over the overall best base-learner, the meta-learning approach turned out to be a safe choice, minimizing the risk of selecting the least appropriate base-learner. There are only 2% of cases recommended by meta- learning that are the worst performing base-learning methods. The second study proposes another efficient and accurate domain adaption approach but using a different meta-learning approach. This study empirically confirms the intuition that there exists a relationship between the similarity of the two different tasks and the depth of network needed to fine-tune in order to achieve accuracy com- parable with that of a model trained from scratch. However, the approach is limited to a single hyper-parameter which is fine-tuning of the network depth based on task similarity. The final study of this research has expanded the set of hyper-parameters while implicitly considering task similarity at the intrinsic dynamics of the training process. The study presents a framework to automatically find a good set of hyper-parameters resulting in reasonably good accuracy, by framing the hyper-parameter selection and tuning within the reinforcement learning regime. The effectiveness of a recommended tuple can be tested very quickly rather than waiting for the network to converge. This approach produces accuracy close to the state-of-the-art approach and is found to be comparatively 20% less computationally expensive than previous approaches. The proposed methods in these studies, belonging to different areas of automatic machine learning, have been thoroughly evaluated on a number of benchmark datasets which confirmed the great potential of these methods
- …