10 research outputs found

    Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

    Full text link
    A rising trend in theoretical deep learning is to understand why deep learning works through Neural Tangent Kernel (NTK) [jgh18], a kernel method that is equivalent to using gradient descent to train a multi-layer infinitely-wide neural network. NTK is a major step forward in the theoretical deep learning because it allows researchers to use traditional mathematical tools to analyze properties of deep neural networks and to explain various neural network techniques from a theoretical view. A natural extension of NTK on graph learning is \textit{Graph Neural Tangent Kernel (GNTK)}, and researchers have already provide GNTK formulation for graph-level regression and show empirically that this kernel method can achieve similar accuracy as GNNs on various bioinformatics datasets [dhs+19]. The remaining question now is whether solving GNTK regression is equivalent to training an infinite-wide multi-layer GNN using gradient descent. In this paper, we provide three new theoretical results. First, we formally prove this equivalence for graph-level regression. Second, we present the first GNTK formulation for node-level regression. Finally, we prove the equivalence for node-level regression

    Online Adaptive Mahalanobis Distance Estimation

    Full text link
    Mahalanobis metrics are widely used in machine learning in conjunction with methods like kk-nearest neighbors, kk-means clustering, and kk-medians clustering. Despite their importance, there has not been any prior work on applying sketching techniques to speed up algorithms for Mahalanobis metrics. In this paper, we initiate the study of dimension reduction for Mahalanobis metrics. In particular, we provide efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances. We first provide a randomized Monte Carlo data structure. Then, we show how we can adapt it to provide our main data structure which can handle sequences of \textit{adaptive} queries and also online updates to both the Mahalanobis metric matrix and the data points, making it amenable to be used in conjunction with prior algorithms for online learning of Mahalanobis metrics

    Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

    Full text link
    Deep learning has been widely used in many fields, but the model training process usually consumes massive computational resources and time. Therefore, designing an efficient neural network training method with a provable convergence guarantee is a fundamental and important research question. In this paper, we present a static half-space report data structure that consists of a fully connected two-layer neural network for shifted ReLU activation to enable activated neuron identification in sublinear time via geometric search. We also prove that our algorithm can converge in O(M2/ϵ2)O(M^2/\epsilon^2) time with network size quadratic in the coefficient norm upper bound MM and error term ϵ\epsilon

    Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training

    Full text link
    In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets A{1,+1}dA \subset \{-1,+1\}^d and B{1,+1}dB \subset \{-1,+1\}^d with A=B=n|A|=|B| = n, if there are exact kk pairs whose inner product passes a certain threshold, i.e., {(a1,b1),,(ak,bk)}A×B\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B such that i[k],ai,biρd\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d, for a threshold ρ(0,1)\rho \in (0,1), the goal is to identify those kk heavy inner products. We provide an algorithm that runs in O(n2ω/3+o(1))O(n^{2 \omega / 3+ o(1)}) time to find the kk inner product pairs that surpass ρd\rho \cdot d threshold with high probability, where ω\omega is the current matrix multiplication exponent. By solving this problem, our method speed up the training of neural networks with ReLU activation function.Comment: IEEE BigData 202

    Adore: Differentially Oblivious Relational Database Operators

    Full text link
    There has been a recent effort in applying differential privacy on memory access patterns to enhance data privacy. This is called differential obliviousness. Differential obliviousness is a promising direction because it provides a principled trade-off between performance and desired level of privacy. To date, it is still an open question whether differential obliviousness can speed up database processing with respect to full obliviousness. In this paper, we present the design and implementation of three new major database operators: selection with projection, grouping with aggregation, and foreign key join. We prove that they satisfy the notion of differential obliviousness. Our differentially oblivious operators have reduced cache complexity, runtime complexity, and output size compared to their state-of-the-art fully oblivious counterparts. We also demonstrate that our implementation of these differentially oblivious operators can outperform their state-of-the-art fully oblivious counterparts by up to 7.4×7.4\times.Comment: VLDB 202

    ZEN: An Optimizing Compiler for Verifiable, Zero-Knowledge Neural Network Inferences

    Get PDF
    We present ZEN, the first optimizing compiler that generates efficient verifiable, zero-knowledge neural network inference schemes. ZEN generates two schemes: ZENacc_{acc} and ZENinfer_{infer}. ZENacc_{acc} proves the accuracy of a committed neural network model; ZENinfer_{infer} proves a specific inference result. Used in combination, these verifiable computation schemes ensure both the privacy of the sensitive user data as well as the confidentiality of the neural network models. However, directly using these schemes on zkSNARKs requires prohibitive computational cost. As an optimizing compiler, ZEN introduces two kinds of optimizations to address this issue: first, ZEN incorporates a new neural network quantization algorithm that incorporate two R1CS friendly optimizations which makes the model to be express in zkSNARKs with less constraints and minimal accuracy loss; second, ZEN introduces a SIMD style optimization, namely stranded encoding, that can encoding multiple 8bit integers in large finite field elements without overwhelming extraction cost. Combining these optimizations, ZEN produces verifiable neural network inference schemes with 5.4322.19×{\bf 5.43} \sim {\bf 22.19} \times (15.35×15.35 \times on average) less R1CS constraints

    GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

    Full text link
    Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details. Although GPT-4V has shown promising results in various multi-modal tasks, leveraging GPT-4V as a generalist evaluator for these tasks has not yet been systematically explored. We comprehensively validate GPT-4V's capabilities for evaluation purposes, addressing tasks ranging from foundational image-to-text and text-to-image synthesis to high-level image-to-image translations and multi-images to text alignment. We employ two evaluation methods, single-answer grading and pairwise comparison, using GPT-4V. Notably, GPT-4V shows promising agreement with humans across various tasks and evaluation methods, demonstrating immense potential for multi-modal LLMs as evaluators. Despite limitations like restricted visual clarity grading and real-world complex reasoning, its ability to provide human-aligned scores enriched with detailed explanations is promising for universal automatic evaluator

    Fast Submodular Function Maximization

    Full text link
    Submodular functions have many real-world applications, such as document summarization, sensor placement, and image segmentation. For all these applications, the key building block is how to compute the maximum value of a submodular function efficiently. We consider both the online and offline versions of the problem: in each iteration, the data set changes incrementally or is not changed, and a user can issue a query to maximize the function on a given subset of the data. The user can be malicious, issuing queries based on previous query results to break the competitive ratio for the online algorithm. Today, the best-known algorithm for online submodular function maximization has a running time of O(nkd2)O(n k d^2) where nn is the total number of elements, dd is the feature dimension and kk is the number of elements to be selected. We propose a new method based on a novel search tree data structure. Our algorithm only takes O~(nk+kd2+nd)\widetilde{O}(nk + kd^2 + nd) time
    corecore