44,235 research outputs found
Posterior Consistency of Semi-Supervised Regression on Graphs
Graph-based semi-supervised regression (SSR) is the problem of estimating the value of a function on a weighted graph from its values (labels) on a small subset of the vertices. This paper is concerned with the consistency of SSR in the context of classification, in the setting where the labels have small noise and the underlying graph weighting is consistent with well-clustered nodes. We present a Bayesian formulation of SSR in which the weighted graph defines a Gaussian prior, using a graph Laplacian, and the labeled data defines a likelihood. We analyze the rate of contraction of the posterior measure around the ground truth in terms of parameters that quantify the small label error and inherent clustering in the graph. We obtain bounds on the rates of contraction and illustrate their sharpness through numerical experiments. The analysis also gives insight into the choice of hyperparameters that enter the definition of the prior
Development of early prediction model for pregnancy-associated hypertension with graph-based semi-supervised learning
Clinical guidelines recommend several risk factors to identify women in early pregnancy at high risk of developing pregnancy-associated hypertension. However, these variables result in low predictive accuracy. Here, we developed a prediction model for pregnancy-associated hypertension using graph-based semi-supervised learning. This is a secondary analysis of a prospective study of healthy pregnant women. To develop the prediction model, we compared the prediction performances across five machine learning methods (semi-supervised learning with both labeled and unlabeled data, semi-supervised learning with labeled data only, logistic regression, support vector machine, and random forest) using three different variable sets: [a] variables from clinical guidelines, [b] selected important variables from the feature selection, and [c] all routine variables. Additionally, the proposed prediction model was compared with placental growth factor, a predictive biomarker for pregnancy-associated hypertension. The study population consisted of 1404 women, including 1347 women with complete follow-up (labeled data) and 57 women with incomplete follow-up (unlabeled data). Among the 1347 with complete follow-up, 2.4% (33/1347) developed pregnancy-associated HTN. Graph-based semi-supervised learning using top 11 variables achieved the best average prediction performance (mean area under the curve (AUC) of 0.89 in training set and 0.81 in test set), with higher sensitivity (72.7% vs 45.5% in test set) and similar specificity (80.0% vs 80.5% in test set) compared to risk factors from clinical guidelines. In addition, our proposed model with graph-based SSL had a higher performance than that of placental growth factor for total study population (AUC, 0.71 vs. 0.80, p < 0.001). In conclusion, we could accurately predict the development pregnancy-associated hypertension in early pregnancy through the use of routine clinical variables with the help of graph-based SSL.ope
On Consistency of Graph-based Semi-supervised Learning
Graph-based semi-supervised learning is one of the most popular methods in
machine learning. Some of its theoretical properties such as bounds for the
generalization error and the convergence of the graph Laplacian regularizer
have been studied in computer science and statistics literatures. However, a
fundamental statistical property, the consistency of the estimator from this
method has not been proved. In this article, we study the consistency problem
under a non-parametric framework. We prove the consistency of graph-based
learning in the case that the estimated scores are enforced to be equal to the
observed responses for the labeled data. The sample sizes of both labeled and
unlabeled data are allowed to grow in this result. When the estimated scores
are not required to be equal to the observed responses, a tuning parameter is
used to balance the loss function and the graph Laplacian regularizer. We give
a counterexample demonstrating that the estimator for this case can be
inconsistent. The theoretical findings are supported by numerical studies.Comment: This paper is accepted by 2019 IEEE 39th International Conference on
Distributed Computing Systems (ICDCS
Bayesian Semi-supervised Learning with Graph Gaussian Processes
We propose a data-efficient Gaussian process-based Bayesian approach to the
semi-supervised learning problem on graphs. The proposed model shows extremely
competitive performance when compared to the state-of-the-art graph neural
networks on semi-supervised learning benchmark experiments, and outperforms the
neural networks in active learning experiments where labels are scarce.
Furthermore, the model does not require a validation data set for early
stopping to control over-fitting. Our model can be viewed as an instance of
empirical distribution regression weighted locally by network connectivity. We
further motivate the intuitive construction of the model with a Bayesian linear
model interpretation where the node features are filtered by an operator
related to the graph Laplacian. The method can be easily implemented by
adapting off-the-shelf scalable variational inference algorithms for Gaussian
processes.Comment: To appear in NIPS 2018 Fixed an error in Figure 2. The previous arxiv
version contains two identical sub-figure
- …