30 research outputs found

    Predicting the labelling of a graph via minimum p-seminorm interpolation

    Get PDF
    We study the problem of predicting the labelling of a graph. The graph is given and a trial sequence of (vertex,label) pairs is then incrementally revealed to the learner. On each trial a vertex is queried and the learner predicts a boolean label. The true label is then returned. The learner’s goal is to minimise mistaken predictions. We propose minimum p-seminorm interpolation to solve this problem. To this end we give a p-seminorm on the space of graph labellings. Thus on every trial we predict using the labelling which minimises the p-seminorm and is also consistent with the revealed (vertex, label) pairs. When p = 2 this is the harmonic energy minimisation procedure of [22], also called (Laplacian) interpolated regularisation in [1]. In the limit as p → 1 this is equivalent to predicting with a label-consistent mincut. We give mistake bounds relative to a label-consistent mincut and a resistive cover of the graph. We say an edge is cut with respect to a labelling if the connected vertices have disagreeing labels. We find that minimising the p-seminorm with p = 1 + ɛ where ɛ → 0 as the graph diameter D → ∞ gives a bound of O(Φ 2 log D) versus a bound of O(ΦD) when p = 2 where Φ is the number of cut edges.

    Exploiting structure defined by data in machine learning: some new analyses

    Get PDF
    This thesis offers some new analyses and presents some new methods for learning in the context of exploiting structure defined by data – for example, when a data distribution has a submanifold support, exhibits cluster structure or exists as an object such as a graph. 1. We present a new PAC-Bayes analysis of learning in this context, which is sharp and in some ways presents a better solution than uniform convergence methods. The PAC-Bayes prior over a hypothesis class is defined in terms of the unknown true risk and smoothness of hypotheses w.r.t. the unknown data-generating distribution. The analysis is “localized” in the sense that complexity of the model enters not as the complexity of an entire hypothesis class, but focused on functions of ultimate interest. Such bounds are derived for various algorithms including SVMs. 2. We consider an idea similar to the p-norm Perceptron for building classifiers on graphs. We define p-norms on the space of functions over graph vertices and consider interpolation using the pnorm as a smoothness measure. The method exploits cluster structure and attains a mistake bound logarithmic in the diameter, compared to a linear lower bound for standard methods. 3. Rademacher complexity is related to cluster structure in data, quantifying the notion that when data clusters we can learn well with fewer examples. In particular we relate transductive learning to cluster structure in the empirical resistance metric. 4. Typical methods for learning over a graph do not scale well in the number of data points – often a graph Laplacian must be inverted which becomes computationally intractable for large data sets. We present online algorithms which, by simplifying the graph in principled way, are able to exploit the structure while remaining computationally tractable for large datasets. We prove state-of-the-art performance guarantees

    Multi-class Graph Clustering via Approximated Effective p-Resistance

    Get PDF
    This paper develops an approximation to the (effective) p-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph p-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the p-Laplacian is that the parameter p induces a controllable bias on cluster structure. The drawback of p-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the p-resistance induced by the p-Laplacian for clustering. For p-resistance, small p biases towards clusters with high internal connectivity while large p biases towards clusters of small “extent,” that is a preference for smaller shortest-path distances between vertices in the cluster. However, the p-resistance is expensive to compute. We overcome this by developing an approximation to the p-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of p-resistance for clustering. Finally, we provide experiments comparing our approximated p-resistance clustering to other p-Laplacian based methods

    Sketch-based Randomized Algorithms for Dynamic Graph Regression

    Full text link
    A well-known problem in data science and machine learning is {\em linear regression}, which is recently extended to dynamic graphs. Existing exact algorithms for updating the solution of dynamic graph regression problem require at least a linear time (in terms of nn: the size of the graph). However, this time complexity might be intractable in practice. In the current paper, we utilize {\em subsampled randomized Hadamard transform} and \textsf{CountSketch} to propose the first randomized algorithms. Suppose that we are given an n×mn\times m matrix embedding MM of the graph, where mnm \ll n. Let rr be the number of samples required for a guaranteed approximation error, which is a sublinear function of nn. Our first algorithm reduces time complexity of pre-processing to O(n(m+1)+2n(m+1)log2(r+1)+rm2)O(n(m + 1) + 2n(m + 1) \log_2(r + 1) + rm^2). Then after an edge insertion or an edge deletion, it updates the approximate solution in O(rm)O(rm) time. Our second algorithm reduces time complexity of pre-processing to O(nnz(M)+m3ϵ2log7(m/ϵ))O \left( nnz(M) + m^3 \epsilon^{-2} \log^7(m/\epsilon) \right), where nnz(M)nnz(M) is the number of nonzero elements of MM. Then after an edge insertion or an edge deletion or a node insertion or a node deletion, it updates the approximate solution in O(qm)O(qm) time, with q=O(m2ϵ2log6(m/ϵ))q=O\left(\frac{m^2}{\epsilon^2} \log^6(m/\epsilon) \right). Finally, we show that under some assumptions, if lnn<ϵ1\ln n < \epsilon^{-1} our first algorithm outperforms our second algorithm and if lnnϵ1\ln n \geq \epsilon^{-1} our second algorithm outperforms our first algorithm

    Efficient First Order Methods for Linear Composite Regularizers

    Get PDF
    A wide class of regularization problems in machine learning and statistics employ a regularization term which is obtained by composing a simple convex function \omega with a linear transformation. This setting includes Group Lasso methods, the Fused Lasso and other total variation methods, multi-task learning methods and many more. In this paper, we present a general approach for computing the proximity operator of this class of regularizers, under the assumption that the proximity operator of the function \omega is known in advance. Our approach builds on a recent line of research on optimal first order optimization methods and uses fixed point iterations for numerically computing the proximity operator. It is more general than current approaches and, as we show with numerical simulations, computationally more efficient than available first order methods which do not achieve the optimal rate. In particular, our method outperforms state of the art O(1/T) methods for overlapping Group Lasso and matches optimal O(1/T^2) methods for the Fused Lasso and tree structured Group Lasso.Comment: 19 pages, 8 figure

    Efficient algorithms for online learning over graphs

    Get PDF
    In this thesis we consider the problem of online learning with labelled graphs, in particular designing algorithms that can perform this problem quickly and with low memory requirements. We consider the tasks of Classification (in which we are asked to predict the labels of vertices) and Similarity Prediction (in which we are asked to predict whether two given vertices have the same label). The first half of the thesis considers non- probabilistic online learning, where there is no probability distribution on the labelling and we bound the number of mistakes of an algorithm by a function of the labelling’s complexity (i.e. its “naturalness"), often the cut- size. The second half of the thesis considers probabilistic machine learning in which we have a known probability distribution on the labelling. Before considering probabilistic online learning we first analyse the junction tree algorithm, on which we base our online algorithms, and design a new ver- sion of it, superior to the otherwise current state of the art. Explicitly, the novel contributions of this thesis are as follows: • A new algorithm for online prediction of the labelling of a graph which has better performance than previous algorithms on certain graph and labelling families. • Two algorithms for online similarity prediction on a graph (a novel problem solved in this thesis). One performs very well whilst the other not so well but which runs exponentially faster. • A new (better than before, in terms of time and space complexity) state of the art junction tree algorithm, as well as an application of it to the problem of online learning in an Ising model. • An algorithm that, in linear time, finds the optimal junction tree for online inference in tree-structured Ising models, the resulting online junction tree algorithm being far superior to the previous state of the art. All claims in this thesis are supported by mathematical proofs

    On Sparsity Inducing Regularization Methods for Machine Learning

    Full text link
    During the past years there has been an explosion of interest in learning methods based on sparsity regularization. In this paper, we discuss a general class of such methods, in which the regularizer can be expressed as the composition of a convex function ω\omega with a linear function. This setting includes several methods such the group Lasso, the Fused Lasso, multi-task learning and many more. We present a general approach for solving regularization problems of this kind, under the assumption that the proximity operator of the function ω\omega is available. Furthermore, we comment on the application of this approach to support vector machines, a technique pioneered by the groundbreaking work of Vladimir Vapnik.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1104.143
    corecore