77 research outputs found

    Canonical Representation Genetic Programming

    Get PDF
    Search spaces sampled by the process of Genetic Programming often consist of programs which can represent a function in many different ways. Thus, when the space is examined it is highly likely that different programs may be tested which represent the same function, which is an undesirable waste of resources. It is argued that, if a search space can be constructed where only unique representations of a function are permitted, then this will be more successful than employing multiple representations. When the search space consists of canonical representations it is called a canonical search space, and when Genetic Programming is applied to this search space, it is called Canonical Representation Genetic Programming. The challenge lies in constructing these search spaces. With some function sets this is a trivial task, and with some function sets this is impossible to achieve. With other function sets it is not clear how the goal can be achieved. In this paper, we specifically examine the search space defined by the function set {+, −, ∗, /} and the terminal set {x, 1}. Drawing inspiration from the fundamental theorem of arithmetic, and results regarding the fundamental theorem of algebra, we construct a representation where each function that can be constructed with this primitive set has a unique representation

    A set-covering model for a bidirectional multi-shift full truckload vehicle routing problem

    Get PDF
    This paper introduces a bidirectional multi-shift full truckload transportation problem with operation dependent service times. The problem is different from the previous container transport problems and the existing approaches for container transport problems and vehicle routing pickup and delivery are either not suitable or inefficient. In this paper, a set covering model is developed for the problem based on a novel route representation and a container-flow mapping. It was demonstrated that the model can be applied to solve real-life, medium sized instances of the container transport problem at a large international port. A lower bound of the problem is also obtained by relaxing the time window constraints to the nearest shifts and transforming the problem into a service network design problem. Implications and managerial insights of the results by the lower bound results are also provided

    Forecasting stock market return with nonlinearity: a genetic programming approach

    Get PDF
    The issue whether return in the stock market is predictable remains ambiguous. This paper attempts to establish new return forecasting models in order to contribute on addressing this issue. In contrast to existing literatures, we first reveal that the model forecasting accuracy can be improved through better model specification without adding any new variables. Instead of having a unified return forecasting model, we argue that stock markets in different countries shall have different forecasting models. Furthermore, we adopt an evolutionary procedure called Genetic programming (GP), to develop our new models with nonlinearity. Our newly-developed forecasting models are testified to be more accurate than traditional AR-family models. More importantly, the trading strategy we propose based on our forecasting models has been verified to be highly profitable in different types of stock markets in terms of stock index futures trading

    Fuzzy C-means-based scenario bundling for stochastic service network design

    Get PDF
    Stochastic service network designs with uncertain demand represented by a set of scenarios can be modelled as a large-scale two-stage stochastic mixed-integer program (SMIP). The progressive hedging algorithm (PHA) is a decomposition method for solving the resulting SMIP. The computational performance of the PHA can be greatly enhanced by decomposing according to scenario bundles instead of individual scenarios. At the heart of bundle-based decomposition is the method for grouping the scenarios into bundles. In this paper, we present a fuzzy c-means-based scenario bundling method to address this problem. Rather than full membership of a bundle, which is typically the case in existing scenario bundling strategies such as k-means, a scenario has partial membership in each of the bundles and can be assigned to more than one bundle in our method. Since the multiple bundle membership of a scenario induces overlap between the bundles, we empirically investigate whether and how the amount of overlap controlled by a fuzzy exponent would affect the performance of the PHA. Experimental results for a less-than-truckload transportation network optimization problem show that the number of iterations required by the PHA to achieve convergence reduces dramatically with large fuzzy exponents, whereas the computation time increases significantly. Experimental studies were conducted to find out a good fuzzy exponent to strike a trade-off between the solution quality and the computational time

    Boosting the Discriminant Power of Naive Bayes

    Full text link
    Naive Bayes has been widely used in many applications because of its simplicity and ability in handling both numerical data and categorical data. However, lack of modeling of correlations between features limits its performance. In addition, noise and outliers in the real-world dataset also greatly degrade the classification performance. In this paper, we propose a feature augmentation method employing a stack auto-encoder to reduce the noise in the data and boost the discriminant power of naive Bayes. The proposed stack auto-encoder consists of two auto-encoders for different purposes. The first encoder shrinks the initial features to derive a compact feature representation in order to remove the noise and redundant information. The second encoder boosts the discriminant power of the features by expanding them into a higher-dimensional space so that different classes of samples could be better separated in the higher-dimensional space. By integrating the proposed feature augmentation method with the regularized naive Bayes, the discrimination power of the model is greatly enhanced. The proposed method is evaluated on a set of machine-learning benchmark datasets. The experimental results show that the proposed method significantly and consistently outperforms the state-of-the-art naive Bayes classifiers.Comment: Accepted by 2022 International Conference on Pattern Recognitio

    A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes

    Full text link
    In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary target of data discretization in classification is to improve the generalization performance. As a result, the data tend to be over-split into many small bins since the data without discretization retain the maximal discriminant information. Thus, we propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data. More specifically, the Max-Dependency criterion maximizes the statistical dependency between the discretized data and the classification variable while the Min-Divergence criterion explicitly minimizes the JS-divergence between the training data and the validation data for a given discretization scheme. The proposed MDmD criterion is technically appealing, but it is difficult to reliably estimate the high-order joint distributions of attributes and the classification variable. We hence further propose a more practical solution, Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute is discretized separately, by simultaneously maximizing the discriminant information and the generalization ability of the discretized data. The proposed MRmD is compared with the state-of-the-art discretization algorithms under the naive Bayes classification framework on 45 machine-learning benchmark datasets. It significantly outperforms all the compared methods on most of the datasets.Comment: Under major revision of Pattern Recognitio

    A regularized attribute weighting framework for naive bayes

    Get PDF
    The Bayesian classification framework has been widely used in many fields, but the covariance matrix is usually difficult to estimate reliably. To alleviate the problem, many naive Bayes (NB) approaches with good performance have been developed. However, the assumption of conditional independence between attributes in NB rarely holds in reality. Various attribute-weighting schemes have been developed to address this problem. Among them, class-specific attribute weighted naive Bayes (CAWNB) has recently achieved good performance by using classification feedback to optimize the attribute weights of each class. However, the derived model may be over-fitted to the training dataset, especially when the dataset is insufficient to train a model with good generalization performance. This paper proposes a regularization technique to improve the generalization capability of CAWNB, which could well balance the trade-off between discrimination power and generalization capability. More specifically, by introducing the regularization term, the proposed method, namely regularized naive Bayes (RNB), could well capture the data characteristics when the dataset is large, and exhibit good generalization performance when the dataset is small. RNB is compared with the state-of-the-art naive Bayes methods. Experiments on 33 machine-learning benchmark datasets demonstrate that RNB outperforms the compared methods significantly

    Geographical and Temporal Huff Model Calibration using Taxi Trajectory Data

    Get PDF
    • …
    corecore