Search CORE

1,480 research outputs found

Ant colony optimization approach for stacking configurations

Author: CHEN Yijun
Publication venue: Digital Commons @ Lingnan University
Publication date: 01/01/2011
Field of study

In data mining, classifiers are generated to predict the class labels of the instances. An ensemble is a decision making system which applies certain strategies to combine the predictions of different classifiers and generate a collective decision. Previous research has empirically and theoretically demonstrated that an ensemble classifier can be more accurate and stable than its component classifiers in most cases. Stacking is a well-known ensemble which adopts a two-level structure: the base-level classifiers to generate predictions and the meta-level classifier to make collective decisions. A consequential problem is: what learning algorithms should be used to generate the base-level and meta-level classifier in the Stacking configuration? It is not easy to find a suitable configuration for a specific dataset. In some early works, the selection of a meta classifier and its training data are the major concern. Recently, researchers have tried to apply metaheuristic methods to optimize the configuration of the base classifiers and the meta classifier. Ant Colony Optimization (ACO), which is inspired by the foraging behaviors of real ant colonies, is one of the most popular approaches among the metaheuristics. In this work, we propose a novel ACO-Stacking approach that uses ACO to tackle the Stacking configuration problem. This work is the first to apply ACO to the Stacking configuration problem. Different implementations of the ACO-Stacking approach are developed. The first version identifies the appropriate learning algorithms in generating the base-level classifiers while using a specific algorithm to create the meta-level classifier. The second version simultaneously finds the suitable learning algorithms to create the base-level classifiers and the meta-level classifier. Moreover, we study how different kinds on local information of classifiers will affect the classification results. Several pieces of local information collected from the initial phase of ACO-Stacking are considered, such as the precision, f-measure of each classifier and correlative differences of paired classifiers. A series of experiments are performed to compare the ACO-Stacking approach with other ensembles on a number of datasets of different domains and sizes. The experiments show that the new approach can achieve promising results and gain advantages over other ensembles. The correlative differences of the classifiers could be the best local information in this approach. Under the agile ACO-Stacking framework, an application to deal with a direct marketing problem is explored. A real world database from a US-based catalog company, containing more than 100,000 customer marketing records, is used in the experiments. The results indicate that our approach can gain more cumulative response lifts and cumulative profit lifts in the top deciles. In conclusion, it is competitive with some well-known conventional and ensemble data mining methods

Digital Commons @ Lingnan University

Tobacco Taxation and Cigarette Consumption: Do Cigarette Tax Hikes Reduce Smoking Participation and Cigarette Consumption?

Author: Chen Yijun
Publication venue: Creative Matter
Publication date: 01/01/2019
Field of study

This paper utilizes two waves of data from the Current Population Survey Tobacco Use Supplement as well as data from the Tax Burden on Tobacco to analyze the impacts of tobacco tax hikes on both smoking participation and daily cigarette consumption. By implementing a difference-in-differences approach, I find that there is a positive insignificant effect of cigarette excise tax hikes on the probability of smoking, while there is a negative insignificant impact on daily cigarette consumption. My empirical results suggest that the tobacco control strategy of raising cigarette taxes seems to only generate tax revenues but is not an effective tool in reducing either smoking participation rates or cigarette consumption. In this case, some possible explanations for my findings are cigarette demand inelasticity, tax salience effect, substitution effect, or inefficient allocation of tax revenues

Skidmore College: Creative Matter

Nonparametric Regression In Natural Exponential Families: A Simulation Study

Author: CHEN YIJUN
Publication venue: Clemson University Libraries
Publication date: 01/08/2015
Field of study

Nonparametric regression has been particularly well developed. Base on the asymptotic equivalence theory, there are some procedures that can turn more complicated nonparametric estimation problems into a standard nonparametric regression, especially in natural exponential families. This procedure is described in detail with a wavelet thresholding estimator for Gaussian nonparametric regression and simulation study shed light on the behavior of this method under different sample sizes and parameterizations of exponential distribution. The resulting estimators have a high degree of adaptivity in [2]

Clemson University: TigerPrints