7 research outputs found

    Genetic algorithms for hyperparameter optimization in predictive business process monitoring

    Get PDF
    Predictive business process monitoring exploits event logs to predict how ongoing (uncompleted) traces will unfold up to their completion. A predictive process monitoring framework collects a range of techniques that allow users to get accurate predictions about the achievement of a goal for a given ongoing trace. These techniques can be combined and their parameters configured in different framework instances. Unfortunately, a unique framework instance that is general enough to outperform others for every dataset, goal or type of prediction is elusive. Thus, the selection and configuration of a framework instance needs to be done for a given dataset. This paper presents a predictive process monitoring framework armed with a hyperparameter optimization method to select a suitable framework instance for a given dataset

    On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice

    Full text link
    Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine learning models has a direct impact on the model's performance. It often requires deep knowledge of machine learning algorithms and appropriate hyper-parameter optimization techniques. Although several automatic optimization techniques exist, they have different strengths and drawbacks when applied to different types of problems. In this paper, optimizing the hyper-parameters of common machine learning models is studied. We introduce several state-of-the-art optimization techniques and discuss how to apply them to machine learning algorithms. Many available libraries and frameworks developed for hyper-parameter optimization problems are provided, and some open challenges of hyper-parameter optimization research are also discussed in this paper. Moreover, experiments are conducted on benchmark datasets to compare the performance of different optimization methods and provide practical examples of hyper-parameter optimization. This survey paper will help industrial users, data analysts, and researchers to better develop machine learning models by identifying the proper hyper-parameter configurations effectively.Comment: 69 Pages, 10 tables, accepted in Neurocomputing, Elsevier. Github link: https://github.com/LiYangHart/Hyperparameter-Optimization-of-Machine-Learning-Algorithm

    Ensemble based on randomised neural networks for online data stream regression in presence of concept drift

    Get PDF
    The big data paradigm has posed new challenges for the Machine Learning algorithms, such as analysing continuous flows of data, in the form of data streams, and dealing with the evolving nature of the data, which cause a phenomenon often referred to in the literature as concept drift. Concept drift is caused by inconsistencies between the optimal hypotheses in two subsequent chunks of data, whereby the concept underlying a given process evolves over time, which can happen due to several factors including change in consumer preference, economic dynamics, or environmental conditions. This thesis explores the problem of data stream regression with the presence of concept drift. This problem requires computationally efficient algorithms that are able to adapt to the various types of drift that may affect the data. The development of effective algorithms for data streams with concept drift requires several steps that are discussed in this research. The first one is related to the datasets required to assess the algorithms. In general, it is not possible to determine the occurrence of concept drift on real-world datasets; therefore, synthetic datasets where the various types of concept drift can be simulated are required. The second issue is related to the choice of the algorithm. The ensemble algorithms show many advantages to deal with concept drifting data streams, which include flexibility, computational efficiency and high accuracy. For the design of an effective ensemble, this research analyses the use of randomised Neural Networks as base models, along with their optimisation. The optimisation of the randomised Neural Networks involves design and tuning hyperparameters which may substantially affect its performance. The optimisation of the base models is an important aspect to build highly accurate and computationally efficient ensembles. To cope with the concept drift, the existing methods either require setting fixed updating points, which may result in unnecessary computations or slow reaction to concept drift, or rely on drifting detection mechanism, which may be ineffective due to the difficulty to detect drift in real applications. Therefore, the research contributions of this thesis include the development of a new approach for synthetic dataset generation, development of a new hyperparameter optimisation algorithm that reduces the search effort and the need of prior assumptions compared to existing methods, the analysis of the effects of randomised Neural Networks hyperparameters, and the development of a new ensemble algorithm based on bagging meta-model that reduces the computational effort over existing methods and uses an innovative updating mechanism to cope with concept drift. The algorithms have been tested on synthetic datasets and validated on four real-world datasets from various application domains

    Optimization of convolutional neural networks for image classification using genetic algorithms and bayesian optimization

    Get PDF
    Notwithstanding the recent successes of deep convolutional neural networks for classification tasks, they are sensitive to the selection of their hyperparameters, which impose an exponentially large search space on modern convolutional models. Traditional hyperparameter selection methods include manual, grid, or random search, but these require expert knowledge or are computationally burdensome. Divergently, Bayesian optimization and evolutionary inspired techniques have surfaced as viable alternatives to the hyperparameter problem. Thus, an alternative hybrid approach that combines the advantages of these techniques is proposed. Specifically, the search space is partitioned into discrete-architectural, and continuous and categorical hyperparameter subspaces, which are respectively traversed by a stochastic genetic search, followed by a genetic-Bayesian search. Simulations on a prominent image classification task reveal that the proposed method results in an overall classification accuracy improvement of 0.87% over unoptimized baselines, and a greater than 97% reduction in computational costs compared to a commonly employed brute force approach.Electrical and Mining EngineeringM. Tech. (Electrical Engineering

    Predicting areas of potential conflicts between bearded vultures (Gypaetus barbatus) and wind turbines in the Swiss Alps

    Get PDF
    The alarming increase in global temperature observed over the last hundred years, driven by the use of fossil fuels, has prompted a shift towards “greener” energy production. An extensive expansion of wind power exploitation is expected in the coming years, which makes its effect on vulnerable species an issue of growing conservation concern. Among the wildlife affected by wind turbines, vultures are probably the most vulnerable avian ecological guild. They have experienced a sharp decline during the last decades and their survival in many areas is the result of targeted recovery and conservation actions. The bearded vulture (Gypaetus barbatus) represents an emblematic example. After having been extirpated from the European Alps, the species once again inhabits its former habitat, thanks to the massive long-lasting effort of a dedicated reintroduction programme. There are concerns, however, that the sprawl of wind turbines in the Alpine massif will jeopardise this successful population recovery. The main goal of this PhD thesis was therefore to predict areas in the Swiss Alps where conflicts between bearded vulture conservation and wind energy development are likely to occur, thus allowing for a more biodiversity-friendly spatial planning of wind turbines. Using a spatially explicit modelling framework with combined information of casual observations and GPS data, I predicted species’ potential distribution as well as its flight behaviour in relation to landscape, wind, and foraging conditions. First, I investigated the species ecological requirements in relation to season and age and translated these into distribution maps covering the whole Swiss Alpine arc. Here the focus was on evaluating the ability of the models to predict the possible future expansion of the species, a crucial point for anticipating potential conflicts arising from the spread of wind energy. During this process, I secondly had to delve into methodological challenges, especially with regard to taking objective decisions for model tuning. Based on the example of modelling the distribution of the bearded vulture, I introduced a new genetic algorithm for hyperparameters tuning, which drastically reduces computation time while achieving a model performance comparable or equal to that obtained with standard methods. Moreover, I generalised the developed routines so as to make them applicable to the most common species distribution modelling techniques and compiled the solutions in an R package now available to the scientific community. Thirdly, I explored the flight height patterns of bearded vultures to identify key factors driving low-height flight activity and delineated areas where the species is likely to fly within the critical height range that is typically swept by the blades of modern wind turbines. Overall, I found that food availability is an important driver of both distribution and low-height flight activity of bearded vultures. Habitat selection differed between seasons and between age classes during the cold season. While food availability and geological substrates were the main drivers of the distribution during the warm season, I observed a shift in the requirement of adult birds in the cold season, where habitat selection was mainly influenced by climatic conditions. This suggests that adult birds may be constrained by favourable winter conditions for the selection of breeding territories. Combining the ecological requirements of both age classes and seasons I found that 40% of the Swiss Alps offers suitable habitat for the species. The model trained with species data collected between 2004 and 2014 was able to accurately predict new breeding territories established in 2015 – 2019, and thus adequately delineated areas where the spreading population will likely to occur in the future and where conflicts with wind energy development might arise. The flight-height analysis of the GPS-tagged birds revealed that bearded vultures mainly fly within the critical height range swept by the turbine blades (77.5% of GPS locations), which poses the species at high risk of collision. Flying at low heights most frequently occurred along south exposed mountainsides and in areas with a high probability of ibex (Capra ibex) presence, a key food source for bearded vulture. Synthesising the information on bearded vulture distribution with the flight height behaviour allowed identifying and mapping areas where the species is likely to fly at risky height within its habitat. This high resolution, spatially explicit information represents a valuable tool for planners involved in wind energy development as well as a first basis for detailed impact assessments, while the methodological framework I developed represents a transferable approach for scientists studying potential conflicts between the development of aerial infrastructure and other target organisms
    corecore