214 research outputs found

    Topics in imbalanced data classification : AdaBoost and Bayesian relevance vector machine

    Get PDF
    This research has three parts addressing classification, especially the imbalanced data problem, which is one of the most popular and essential issues in the domain of classification. The first part is to study the Adaptive Boosting (AdaBoost) algorithm. AdaBoost is an effective solution for classification, but it still needs improvement in the imbalanced data problem. This part proposes a method to improve the AdaBoost algorithm using the new weighted vote parameters for the weak classifiers. Our proposed weighted vote parameters are determined not only by the global error rate but also by the classification accuracy rate of the positive class, which is our primary interest. The imbalanced index of the data is also a factor in constructing our algorithms. The numeric studies show that our proposed algorithms outperform the traditional ones, especially regarding the evaluation criterion of the F--1 Measure. Theoretic proofs of the advantages of our proposed algorithms are presented. The second part treats the Relevance Vector Machine (RVM), which is a supervised learning algorithm extended from the Support Vector Machine (SVM) based on the Bayesian sparsity model. Compared with the regression problem, RVM classification is challenging to conduct because there is no closed-form solution for the weight parameter posterior. The original RVM classification algorithm uses Newton's method in optimization to obtain the mode of weight parameter posterior, then approximates it by a Gaussian distribution in Laplace's method. This original model would work, but it just applies the frequency methods in a Bayesian framework. This part first proposes a Generic Bayesian RVM classification, which is a pure Bayesian model. We conjecture that our algorithm achieves convergent estimates of the quantities of interest compared with the nonconvergent estimates of the original RVM classification algorithm. Furthermore, a fully Bayesian approach with the hierarchical hyperprior structure for RVM classification is proposed, which improves the classification performance, especially in the imbalanced data problem. The third part is an extended work of the second one. The original RVM classification model uses the logistic link function to build the likelihood, which makes the model hard to conduct since the posterior of the weight parameter has no closed-form solution. This part proposes the probit link function approach instead of the logistic one for the likelihood function in RVM classification, namely PRVM (RVM with the Probit link function). We show that the posterior of the weight parameter in our model follows the multivariate normal distribution and achieves a closed-form solution. A latent variable is needed in our algorithm to simplify the Bayesian computation greatly, and its conditional posterior follows a truncated normal distribution. Compared with the original RVM classification model, our proposed one is another pure Bayesian approach and it has a more efficient computation process. For the prior structure, we first consider the Normal-Gamma independent prior to propose a Generic Bayesian PRVM algorithm. Furthermore, the Fully Bayesian PRVM algorithm with a hierarchical hyperprior structure is proposed, which improves the classification performance, especially in the imbalanced data problem

    Fully Bayesian Analysis of the Relevance Vector Machine Classification for Imbalanced Data

    Full text link
    Relevance Vector Machine (RVM) is a supervised learning algorithm extended from Support Vector Machine (SVM) based on the Bayesian sparsity model. Compared with the regression problem, RVM classification is difficult to be conducted because there is no closed-form solution for the weight parameter posterior. Original RVM classification algorithm used Newton's method in optimization to obtain the mode of weight parameter posterior then approximated it by a Gaussian distribution in Laplace's method. It would work but just applied the frequency methods in a Bayesian framework. This paper proposes a Generic Bayesian approach for the RVM classification. We conjecture that our algorithm achieves convergent estimates of the quantities of interest compared with the nonconvergent estimates of the original RVM classification algorithm. Furthermore, a Fully Bayesian approach with the hierarchical hyperprior structure for RVM classification is proposed, which improves the classification performance, especially in the imbalanced data problem. By the numeric studies, our proposed algorithms obtain high classification accuracy rates. The Fully Bayesian hierarchical hyperprior method outperforms the Generic one for the imbalanced data classification.Comment: 24 Pages, 3 figures, preprint to submit to Electronic Journal of Statistic

    THE EFFECT OF TRUST ON INFORMATION DIFFUSION IN ONLINE SOCIAL NETWORKS

    Get PDF
    online social networks have a explosive growth in recent years and they provide a perfect platform for information diffusion. Many models have been given to explore the information diffusion procedure and its dynamics. But the trust relationship and memory effect are ignored. Based on the complex network theory, The information diffusion model is proposed and the network users, considered as agents, are classified into susceptible, infected and recovered individuals. The users’ behaviour rule and diffusion process are designed. The proposed agent-based model is tested by simulation experiments in four different complex networks: regular network, small world network, random network and scale-free network. Moreover, the effect of four immunization strategies are explored. The research results show that the influence of users’ trust relationship on different networks is varied, and the vertex weight priority immunization strategy is the best one in all four networks

    Fully Bayesian Analysis of Relevance Vector Machine Classification With Probit Link Function for Imbalanced Data Problem

    Get PDF
    The original RVM classification model uses the logistic link function to build the likelihood function making the model hard to be conducted since the posterior of the weight parameter has no closed-form solution. This article proposes the probit link function approach instead of the logistic one for the likelihood function in the RVM classification model, namely PRVM (RVM with the probit link function). We show that the posterior of the weight parameter in PRVM follows the Multivariate Normal distribution and achieves a closed-form solution. A latent variable is needed in our algorithms to simplify the Bayesian computation greatly, and its conditional posterior follows a truncated Normal distribution. Compared with the original RVM classification model, our proposed one is a Fully Bayesian approach, and it has a more efficient computation process. For the prior structure, we first consider the Normal-Gamma independent prior to propose a Generic Bayesian PRVM algorithm. Furthermore, the Fully Bayesian PRVM algorithm with a hierarchical hyperprior structure is proposed, which improves the classification performance, especially in the imbalanced data problem

    Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method

    Full text link
    The past decade has witnessed great strides in video recovery by specialist technologies, like video inpainting, completion, and error concealment. However, they typically simulate the missing content by manual-designed error masks, thus failing to fill in the realistic video loss in video communication (e.g., telepresence, live streaming, and internet video) and multimedia forensics. To address this, we introduce the bitstream-corrupted video (BSCV) benchmark, the first benchmark dataset with more than 28,000 video clips, which can be used for bitstream-corrupted video recovery in the real world. The BSCV is a collection of 1) a proposed three-parameter corruption model for video bitstream, 2) a large-scale dataset containing rich error patterns, multiple corruption levels, and flexible dataset branches, and 3) a plug-and-play module in video recovery framework that serves as a benchmark. We evaluate state-of-the-art video inpainting methods on the BSCV dataset, demonstrating existing approaches' limitations and our framework's advantages in solving the bitstream-corrupted video recovery problem. The benchmark and dataset are released at https://github.com/LIUTIGHE/BSCV-Dataset.Comment: Accepted by NeurIPS Dataset and Benchmark Track 202

    Genetic variation along an altitudinal gradient in the Phytophthora infestans effector gene Pi02860

    Get PDF
    Effector genes, together with climatic and other environmental factors, play multifaceted roles in the development of plant diseases. Understanding the role of environmental factors, particularly climate conditions affecting the evolution of effector genes, is important for predicting the long-term value of the genes in controlling agricultural diseases. Here, we collected Phytophthora infestans populations from five locations along a mountainous hill in China and sequenced the effector gene Pi02860 from >300 isolates. To minimize the influence of other ecological factors, isolates were sampled from the same potato cultivar on the same day. We also expressed the gene to visualise its cellular location, assayed its pathogenicity and evaluated its response to experimental temperatures. We found that Pi02860 exhibited moderate genetic variation at the nucleotide level which was mainly generated by point mutation. The mutations did not change the cellular location of the effector gene but significantly modified the fitness of P. infestans. Genetic variation and pathogenicity of the effector gene were positively associated with the altitude of sample sites, possibly due to increased mutation rate induced by the vertical distribution of environmental factors such as UV radiation and temperature. We further found that Pi02860 expression was regulated by experimental temperature with reduced expression as experimental temperature increased. Together, these results indicate that UV radiation and temperature are important environmental factors regulating the evolution of effector genes and provide us with considerable insight as to their future sustainable action under climate and other environmental change
    • …
    corecore