214 research outputs found
Topics in imbalanced data classification : AdaBoost and Bayesian relevance vector machine
This research has three parts addressing classification, especially the imbalanced data problem, which is one of the most popular and essential issues in the domain of classification. The first part is to study the Adaptive Boosting (AdaBoost) algorithm. AdaBoost is an effective solution for classification, but it still needs improvement in the imbalanced data problem. This part proposes a method to improve the AdaBoost algorithm using the new weighted vote parameters for the weak classifiers. Our proposed weighted vote parameters are determined not only by the global error rate but also by the classification accuracy rate of the positive class, which is our primary interest. The imbalanced index of the data is also a factor in constructing our algorithms. The numeric studies show that our proposed algorithms outperform the traditional ones, especially regarding the evaluation criterion of the F--1 Measure. Theoretic proofs of the advantages of our proposed algorithms are presented. The second part treats the Relevance Vector Machine (RVM), which is a supervised learning algorithm extended from the Support Vector Machine (SVM) based on the Bayesian sparsity model. Compared with the regression problem, RVM classification is challenging to conduct because there is no closed-form solution for the weight parameter posterior. The original RVM classification algorithm uses Newton's method in optimization to obtain the mode of weight parameter posterior, then approximates it by a Gaussian distribution in Laplace's method. This original model would work, but it just applies the frequency methods in a Bayesian framework. This part first proposes a Generic Bayesian RVM classification, which is a pure Bayesian model. We conjecture that our algorithm achieves convergent estimates of the quantities of interest compared with the nonconvergent estimates of the original RVM classification algorithm. Furthermore, a fully Bayesian approach with the hierarchical hyperprior structure for RVM classification is proposed, which improves the classification performance, especially in the imbalanced data problem. The third part is an extended work of the second one. The original RVM classification model uses the logistic link function to build the likelihood, which makes the model hard to conduct since the posterior of the weight parameter has no closed-form solution. This part proposes the probit link function approach instead of the logistic one for the likelihood function in RVM classification, namely PRVM (RVM with the Probit link function). We show that the posterior of the weight parameter in our model follows the multivariate normal distribution and achieves a closed-form solution. A latent variable is needed in our algorithm to simplify the Bayesian computation greatly, and its conditional posterior follows a truncated normal distribution. Compared with the original RVM classification model, our proposed one is another pure Bayesian approach and it has a more efficient computation process. For the prior structure, we first consider the Normal-Gamma independent prior to propose a Generic Bayesian PRVM algorithm. Furthermore, the Fully Bayesian PRVM algorithm with a hierarchical hyperprior structure is proposed, which improves the classification performance, especially in the imbalanced data problem
Fully Bayesian Analysis of the Relevance Vector Machine Classification for Imbalanced Data
Relevance Vector Machine (RVM) is a supervised learning algorithm extended
from Support Vector Machine (SVM) based on the Bayesian sparsity model.
Compared with the regression problem, RVM classification is difficult to be
conducted because there is no closed-form solution for the weight parameter
posterior. Original RVM classification algorithm used Newton's method in
optimization to obtain the mode of weight parameter posterior then approximated
it by a Gaussian distribution in Laplace's method. It would work but just
applied the frequency methods in a Bayesian framework. This paper proposes a
Generic Bayesian approach for the RVM classification. We conjecture that our
algorithm achieves convergent estimates of the quantities of interest compared
with the nonconvergent estimates of the original RVM classification algorithm.
Furthermore, a Fully Bayesian approach with the hierarchical hyperprior
structure for RVM classification is proposed, which improves the classification
performance, especially in the imbalanced data problem. By the numeric studies,
our proposed algorithms obtain high classification accuracy rates. The Fully
Bayesian hierarchical hyperprior method outperforms the Generic one for the
imbalanced data classification.Comment: 24 Pages, 3 figures, preprint to submit to Electronic Journal of
Statistic
THE EFFECT OF TRUST ON INFORMATION DIFFUSION IN ONLINE SOCIAL NETWORKS
online social networks have a explosive growth in recent years and they provide a perfect platform for information diffusion. Many models have been given to explore the information diffusion procedure and its dynamics. But the trust relationship and memory effect are ignored. Based on the complex network theory, The information diffusion model is proposed and the network users, considered as agents, are classified into susceptible, infected and recovered individuals. The users’ behaviour rule and diffusion process are designed. The proposed agent-based model is tested by simulation experiments in four different complex networks: regular network, small world network, random network and scale-free network. Moreover, the effect of four immunization strategies are explored. The research results show that the influence of users’ trust relationship on different networks is varied, and the vertex weight priority immunization strategy is the best one in all four networks
Fully Bayesian Analysis of Relevance Vector Machine Classification With Probit Link Function for Imbalanced Data Problem
The original RVM classification model uses the logistic link function to build the likelihood function making the model hard to be conducted since the posterior of the weight parameter has no closed-form solution. This article proposes the probit link function approach instead of the logistic one for the likelihood function in the RVM classification model, namely PRVM (RVM with the probit link function). We show that the posterior of the weight parameter in PRVM follows the Multivariate Normal distribution and achieves a closed-form solution. A latent variable is needed in our algorithms to simplify the Bayesian computation greatly, and its conditional posterior follows a truncated Normal distribution. Compared with the original RVM classification model, our proposed one is a Fully Bayesian approach, and it has a more efficient computation process. For the prior structure, we first consider the Normal-Gamma independent prior to propose a Generic Bayesian PRVM algorithm. Furthermore, the Fully Bayesian PRVM algorithm with a hierarchical hyperprior structure is proposed, which improves the classification performance, especially in the imbalanced data problem
Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method
The past decade has witnessed great strides in video recovery by specialist
technologies, like video inpainting, completion, and error concealment.
However, they typically simulate the missing content by manual-designed error
masks, thus failing to fill in the realistic video loss in video communication
(e.g., telepresence, live streaming, and internet video) and multimedia
forensics. To address this, we introduce the bitstream-corrupted video (BSCV)
benchmark, the first benchmark dataset with more than 28,000 video clips, which
can be used for bitstream-corrupted video recovery in the real world. The BSCV
is a collection of 1) a proposed three-parameter corruption model for video
bitstream, 2) a large-scale dataset containing rich error patterns, multiple
corruption levels, and flexible dataset branches, and 3) a plug-and-play module
in video recovery framework that serves as a benchmark. We evaluate
state-of-the-art video inpainting methods on the BSCV dataset, demonstrating
existing approaches' limitations and our framework's advantages in solving the
bitstream-corrupted video recovery problem. The benchmark and dataset are
released at https://github.com/LIUTIGHE/BSCV-Dataset.Comment: Accepted by NeurIPS Dataset and Benchmark Track 202
Genetic variation along an altitudinal gradient in the Phytophthora infestans effector gene Pi02860
Effector genes, together with climatic and other environmental factors, play multifaceted roles in the development of plant diseases. Understanding the role of environmental factors, particularly climate conditions affecting the evolution of effector genes, is important for predicting the long-term value of the genes in controlling agricultural diseases. Here, we collected Phytophthora infestans populations from five locations along a mountainous hill in China and sequenced the effector gene Pi02860 from >300 isolates. To minimize the influence of other ecological factors, isolates were sampled from the same potato cultivar on the same day. We also expressed the gene to visualise its cellular location, assayed its pathogenicity and evaluated its response to experimental temperatures. We found that Pi02860 exhibited moderate genetic variation at the nucleotide level which was mainly generated by point mutation. The mutations did not change the cellular location of the effector gene but significantly modified the fitness of P. infestans. Genetic variation and pathogenicity of the effector gene were positively associated with the altitude of sample sites, possibly due to increased mutation rate induced by the vertical distribution of environmental factors such as UV radiation and temperature. We further found that Pi02860 expression was regulated by experimental temperature with reduced expression as experimental temperature increased. Together, these results indicate that UV radiation and temperature are important environmental factors regulating the evolution of effector genes and provide us with considerable insight as to their future sustainable action under climate and other environmental change
- …