1,152 research outputs found

    Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing

    Full text link
    Computation of document image quality metrics often depends upon the availability of a ground truth image corresponding to the document. This limits the applicability of quality metrics in applications such as hyperparameter optimization of image processing algorithms that operate on-the-fly on unseen documents. This work proposes the use of surrogate models to learn the behavior of a given document quality metric on existing datasets where ground truth images are available. The trained surrogate model can later be used to predict the metric value on previously unseen document images without requiring access to ground truth images. The surrogate model is empirically evaluated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets

    Super learner implementation in corrosion rate prediction

    Get PDF
    This thesis proposes a new machine learning model for predicting the corrosion rate of 3C steel in seawater. The corrosion rate of a material depends not just on the nature of the material but also on the material\u27s environmental conditions. The proposed machine learning model comes with a selection framework based on the hyperparameter optimization method and a performance evaluation metric to determine the models that qualify for further implementation in the proposed models’ ensembles architecture. The major aim of the selection framework is to select the least number of models that will fit efficiently (while already hyperparameter-optimized) into the architecture of the proposed model. Subsequently, the proposed predictive model is fitted on some portion of a dataset generated from an experiment on corrosion rate in five different seawater conditions. The remaining portion of this dataset is implemented in estimating the corrosion rate. Furthermore, the performance of the proposed models’ predictions was evaluated using three major performance evaluation metrics. These metrics were also used to evaluate the performance of two hyperparameter-optimized models (Smart Firefly Algorithm and Least Squares Support Vector Regression (SFA-LSSVR) and Support Vector Regression integrating Leave Out One Cross-Validation (SVR-LOOCV)) to facilitate their comparison with the proposed predictive model and its constituent models. The test results show that the proposed model performs slightly below the SFA-LSSVR model and above the SVR-LOOCV model by an RMSE score difference of 0.305 and RMSE score of 0.792. Despite its poor performance against the SFA-LSSVR model, the super learner model outperforms both hyperparameter-optimized models in the utilization of memory and computation time (graphically presented in this thesis)

    ADAPTING LEAST-SQUARE SUPPORT VECTOR REGRESSION MODELS TO FORECAST THE OUTCOME OF HORSERACES

    Get PDF
    This paper introduces an improved approach for forecasting the outcome of horseraces. Building upon previous literature, a state-of-the-art modelling paradigm is developed which integrates least-square support vector regression and conditional logit procedures to predict horses’ winning probabilities. In order to adapt the least-square support vector regression model to this task, some free parameters have to be determined within a model selection step. Traditionally, this is accomplished by assessing candidate settings in terms of mean-squared error between estimated and actual finishing positions. This paper proposes an augmented approach to organise model selection for horserace forecasting using the concept of ranking borrowed from internet search engine evaluation. In particular, it is shown that the performance of forecasting models can be improved significantly if parameter settings are chosen on the basis of their normalised discounted cumulative gain (i.e. their ability to accurately rank the first few finishers of a race), rather than according to general purpose performance indicators which weight the ability to predict the rank order finish position of all horses equally

    Which Surrogate Works for Empirical Performance Modelling? A Case Study with Differential Evolution

    Full text link
    It is not uncommon that meta-heuristic algorithms contain some intrinsic parameters, the optimal configuration of which is crucial for achieving their peak performance. However, evaluating the effectiveness of a configuration is expensive, as it involves many costly runs of the target algorithm. Perhaps surprisingly, it is possible to build a cheap-to-evaluate surrogate that models the algorithm's empirical performance as a function of its parameters. Such surrogates constitute an important building block for understanding algorithm performance, algorithm portfolio/selection, and the automatic algorithm configuration. In principle, many off-the-shelf machine learning techniques can be used to build surrogates. In this paper, we take the differential evolution (DE) as the baseline algorithm for proof-of-concept study. Regression models are trained to model the DE's empirical performance given a parameter configuration. In particular, we evaluate and compare four popular regression algorithms both in terms of how well they predict the empirical performance with respect to a particular parameter configuration, and also how well they approximate the parameter versus the empirical performance landscapes

    Review of automated time series forecasting pipelines

    Get PDF
    Time series forecasting is fundamental for various use cases in different domains such as energy systems and economics. Creating a forecasting model for a specific use case requires an iterative and complex design process. The typical design process includes the five sections (1) data pre-processing, (2) feature engineering, (3) hyperparameter optimization, (4) forecasting method selection, and (5) forecast ensembling, which are commonly organized in a pipeline structure. One promising approach to handle the ever-growing demand for time series forecasts is automating this design process. The present paper, thus, analyzes the existing literature on automated time series forecasting pipelines to investigate how to automate the design process of forecasting models. Thereby, we consider both Automated Machine Learning (AutoML) and automated statistical forecasting methods in a single forecasting pipeline. For this purpose, we firstly present and compare the proposed automation methods for each pipeline section. Secondly, we analyze the automation methods regarding their interaction, combination, and coverage of the five pipeline sections. For both, we discuss the literature, identify problems, give recommendations, and suggest future research. This review reveals that the majority of papers only cover two or three of the five pipeline sections. We conclude that future research has to holistically consider the automation of the forecasting pipeline to enable the large-scale application of time series forecasting

    Modelling tourism demand to Spain with machine learning techniques. The impact of forecast horizon on model selection

    Get PDF
    This study assesses the influence of the forecast horizon on the forecasting performance of several machine learning techniques. We compare the fo recastaccuracy of Support Vector Regression (SVR) to Neural Network (NN) models, using a linear model as a benchmark. We focus on international tourism demand to all seventeen regions of Spain. The SVR with a Gaussian radial basis function kernel outperforms the rest of the models for the longest forecast horizons. We also find that machine learning methods improve their forecasting accuracy with respect to linear models as forecast horizons increase. This results shows the suitability of SVR for medium and long term forecasting.Peer ReviewedPostprint (published version

    A knowledge perspective

    Get PDF
    Costa-Mendes, R., Cruz-Jesus, F., Oliveira, T., & Castelli, M. (2021). Machine learning bias in predicting high school grades: A knowledge perspective. Emerging Science Journal, 5(5), 576-597. https://doi.org/10.28991/esj-2021-01298This study focuses on the machine learning bias when predicting teacher grades. The experimental phase consists of predicting the student grades of 11th and 12thgrade Portuguese high school grades and computing the bias and variance decomposition. In the base implementation, only the academic achievement critical factors are considered. In the second implementation, the preceding year’s grade is appended as an input variable. The machine learning algorithms in use are random forest, support vector machine, and extreme boosting machine. The reasons behind the poor performance of the machine learning algorithms are either the input space poor preciseness or the lack of a sound record of student performance. We introduce the new concept of knowledge bias and a new predictive model classification. Precision education would reduce bias by providing low-bias intensive-knowledge models. To avoid bias, it is not necessary to add knowledge to the input space. Low-bias extensive-knowledge models are achievable simply by appending the student’s earlier performance record to the model. The low-bias intensive-knowledge learning models promoted by precision education are suited to designing new policies and actions toward academic attainments. If the aim is solely prediction, deciding for a low bias knowledge-extensive model can be appropriate and correct.publishersversionpublishe
    • …
    corecore