Search CORE

77 research outputs found

Canonical Representation Genetic Programming

Author: John R. Woodward
Ruibin Bai
Publication venue
Publication date: 01/01/2009
Field of study

Search spaces sampled by the process of Genetic Programming often consist of programs which can represent a function in many different ways. Thus, when the space is examined it is highly likely that different programs may be tested which represent the same function, which is an undesirable waste of resources. It is argued that, if a search space can be constructed where only unique representations of a function are permitted, then this will be more successful than employing multiple representations. When the search space consists of canonical representations it is called a canonical search space, and when Genetic Programming is applied to this search space, it is called Canonical Representation Genetic Programming. The challenge lies in constructing these search spaces. With some function sets this is a trivial task, and with some function sets this is impossible to achieve. With other function sets it is not clear how the goal can be achieved. In this paper, we specifically examine the search space defined by the function set {+, −, ∗, /} and the terminal set {x, 1}. Drawing inspiration from the fundamental theorem of arithmetic, and results regarding the fundamental theorem of algebra, we construct a representation where each function that can be constructed with this primitive set has a unique representation

CiteSeerX

Crossref

A set-covering model for a bidirectional multi-shift full truckload vehicle routing problem

Author: Bai Ruibin
Chen Jianjun
Roberts Gethin Wyn
Xue Ning
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

This paper introduces a bidirectional multi-shift full truckload transportation problem with operation dependent service times. The problem is different from the previous container transport problems and the existing approaches for container transport problems and vehicle routing pickup and delivery are either not suitable or inefficient. In this paper, a set covering model is developed for the problem based on a novel route representation and a container-flow mapping. It was demonstrated that the model can be applied to solve real-life, medium sized instances of the container transport problem at a large international port. A lower bound of the problem is also obtained by relaxing the time window constraints to the nearest shifts and transforming the problem into a service network design problem. Implications and managerial insights of the results by the lower bound results are also provided

Repository@Nottingham

Forecasting stock market return with nonlinearity: a genetic programming approach

Author: Bai Ruibin
Cui Tianxiang
Ding Shusheng
Xiong Xihan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2020
Field of study

The issue whether return in the stock market is predictable remains ambiguous. This paper attempts to establish new return forecasting models in order to contribute on addressing this issue. In contrast to existing literatures, we first reveal that the model forecasting accuracy can be improved through better model specification without adding any new variables. Instead of having a unified return forecasting model, we argue that stock markets in different countries shall have different forecasting models. Furthermore, we adopt an evolutionary procedure called Genetic programming (GP), to develop our new models with nonlinearity. Our newly-developed forecasting models are testified to be more accurate than traditional AR-family models. More importantly, the trading strategy we propose based on our forecasting models has been verified to be highly profitable in different types of stock markets in terms of stock index futures trading

Nottingham ePrints

Nottingham eTheses

Fuzzy C-means-based scenario bundling for stochastic service network design

Author: Aickelin Uwe
Bai Ruibin
Jiang Xiaoping
Landa-Silva Dario
Publication venue
Publication date: 01/01/2017
Field of study

Stochastic service network designs with uncertain demand represented by a set of scenarios can be modelled as a large-scale two-stage stochastic mixed-integer program (SMIP). The progressive hedging algorithm (PHA) is a decomposition method for solving the resulting SMIP. The computational performance of the PHA can be greatly enhanced by decomposing according to scenario bundles instead of individual scenarios. At the heart of bundle-based decomposition is the method for grouping the scenarios into bundles. In this paper, we present a fuzzy c-means-based scenario bundling method to address this problem. Rather than full membership of a bundle, which is typically the case in existing scenario bundling strategies such as k-means, a scenario has partial membership in each of the bundles and can be assigned to more than one bundle in our method. Since the multiple bundle membership of a scenario induces overlap between the bundles, we empirically investigate whether and how the amount of overlap controlled by a fuzzy exponent would affect the performance of the PHA. Experimental results for a less-than-truckload transportation network optimization problem show that the number of iterations required by the PHA to achieve convergence reduces dramatically with large fuzzy exponents, whereas the computation time increases significantly. Experimental studies were conducted to find out a good fuzzy exponent to strike a trade-off between the solution quality and the computational time

Nottingham ePrints

Nottingham eTheses

Crossref

University of Melbourne Institutional Repository

Boosting the Discriminant Power of Naive Bayes

Author: Bai Ruibin
Jiang Xudong
Lian Xiaoyu
Ren Jianfeng
Wang Shihe
Publication venue
Publication date: 20/09/2022
Field of study

Naive Bayes has been widely used in many applications because of its simplicity and ability in handling both numerical data and categorical data. However, lack of modeling of correlations between features limits its performance. In addition, noise and outliers in the real-world dataset also greatly degrade the classification performance. In this paper, we propose a feature augmentation method employing a stack auto-encoder to reduce the noise in the data and boost the discriminant power of naive Bayes. The proposed stack auto-encoder consists of two auto-encoders for different purposes. The first encoder shrinks the initial features to derive a compact feature representation in order to remove the noise and redundant information. The second encoder boosts the discriminant power of the features by expanding them into a higher-dimensional space so that different classes of samples could be better separated in the higher-dimensional space. By integrating the proposed feature augmentation method with the regularized naive Bayes, the discrimination power of the model is greatly enhanced. The proposed method is evaluated on a set of machine-learning benchmark datasets. The experimental results show that the proposed method significantly and consistently outperforms the state-of-the-art naive Bayes classifiers.Comment: Accepted by 2022 International Conference on Pattern Recognitio

arXiv.org e-Print Archive

A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes

Author: Bai Ruibin
Jiang Xudong
Ren Jianfeng
Wang Shihe
Yao Yuan
Publication venue
Publication date: 04/04/2023
Field of study

In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary target of data discretization in classification is to improve the generalization performance. As a result, the data tend to be over-split into many small bins since the data without discretization retain the maximal discriminant information. Thus, we propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data. More specifically, the Max-Dependency criterion maximizes the statistical dependency between the discretized data and the classification variable while the Min-Divergence criterion explicitly minimizes the JS-divergence between the training data and the validation data for a given discretization scheme. The proposed MDmD criterion is technically appealing, but it is difficult to reliably estimate the high-order joint distributions of attributes and the classification variable. We hence further propose a more practical solution, Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute is discretized separately, by simultaneously maximizing the discriminant information and the generalization ability of the discretized data. The proposed MRmD is compared with the state-of-the-art discretization algorithms under the naive Bayes classification framework on 45 machine-learning benchmark datasets. It significantly outperforms all the compared methods on most of the datasets.Comment: Under major revision of Pattern Recognitio

arXiv.org e-Print Archive

A regularized attribute weighting framework for naive bayes

Author: Bai Ruibin
Ren Jianfeng
Wang Shihe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/12/2020
Field of study

The Bayesian classification framework has been widely used in many fields, but the covariance matrix is usually difficult to estimate reliably. To alleviate the problem, many naive Bayes (NB) approaches with good performance have been developed. However, the assumption of conditional independence between attributes in NB rarely holds in reality. Various attribute-weighting schemes have been developed to address this problem. Among them, class-specific attribute weighted naive Bayes (CAWNB) has recently achieved good performance by using classification feedback to optimize the attribute weights of each class. However, the derived model may be over-fitted to the training dataset, especially when the dataset is insufficient to train a model with good generalization performance. This paper proposes a regularization technique to improve the generalization capability of CAWNB, which could well balance the trade-off between discrimination power and generalization capability. More specifically, by introducing the regularization term, the proposed method, namely regularized naive Bayes (RNB), could well capture the data characteristics when the dataset is large, and exhibit good generalization performance when the dataset is small. RNB is compared with the state-of-the-art naive Bayes methods. Experiments on 33 machine-learning benchmark datasets demonstrate that RNB outperforms the compared methods significantly

Nottingham ePrints

Nottingham eTheses

Geographical and Temporal Huff Model Calibration using Taxi Trajectory Data

Author: Bai Ruibin
Cartlidge John
Gong Shuhui
Li Qingquan
Qiu Guoping
Yue Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/02/2020
Field of study

Explore Bristol Research