Search CORE

14 research outputs found

On the Power of Learning from k-Wise Queries

Author: Feldman Vitaly
Ghazi Badih
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)
Publication date: 01/01/2017
Field of study

Several well-studied models of access to data samples, including statistical queries, local differential privacy and low-communication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of Ex_{x ~ D}[q(x)] for any choice of the query function q mapping X to the reals, where D is an unknown data distribution over X.) Yet some data analysis algorithms rely on properties of functions that depend on multiple samples. Such algorithms would be naturally implemented using k-wise queries each of which is specified by a function q mapping X^k to the reals. Hence it is natural to ask whether algorithms using k-wise queries can solve learning problems more efficiently and by how much. Blum, Kalai and Wasserman (2003) showed that for any weak PAC learning problem over a fixed distribution, the complexity of learning with k-wise SQs is smaller than the (unary) SQ complexity by a factor of at most 2^k. We show that for more general problems over distributions the picture is substantially richer. For every k, the complexity of distribution-independent PAC learning with k-wise queries can be exponentially larger than learning with (k+1)-wise queries. We then give two approaches for simulating a k-wise query using unary queries. The first approach exploits the structure of the problem that needs to be solved. It generalizes and strengthens (exponentially) the results of Blum et al.. It allows us to derive strong lower bounds for learning DNF formulas and stochastic constraint satisfaction problems that hold against algorithms using k-wise queries. The second approach exploits the k-party communication complexity of the k-wise query function

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

Author: Abbe Emmanuel
Boix-Adsera Enric
Misiakiewicz Theodor
Publication venue
Publication date: 31/08/2023
Field of study

We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure -- the leap -- which measures how "hierarchical" target functions are. For

d

-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function

f

with low-dimensional support is

\tilde\Theta (d^{\max(\mathrm{Leap}(f),2)})

. We prove a version of this conjecture for a class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from [Abbe et al. 2022] by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here. Finally, we note that this gives an SGD complexity for the full training trajectory that matches that of Correlational Statistical Query (CSQ) lower-bounds

arXiv.org e-Print Archive

Learning with non-Standard Supervision

Author: Urner Ruth
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

Machine learning has enjoyed astounding practical success in a wide range of applications in recent years-practical success that often hurries ahead of our theoretical understanding. The standard framework for machine learning theory assumes full supervision, that is, training data consists of correctly labeled iid examples from the same task that the learned classifier is supposed to be applied to. However, many practical applications successfully make use of the sheer abundance of data that is currently produced. Such data may not be labeled or may be collected from various sources. The focus of this thesis is to provide theoretical analysis of machine learning regimes where the learner is given such (possibly large amounts) of non-perfect training data. In particular, we investigate the benefits and limitations of learning with unlabeled data in semi-supervised learning and active learning as well as benefits and limitations of learning from data that has been generated by a task that is different from the target task (domain adaptation learning). For all three settings, we propose Probabilistic Lipschitzness to model the relatedness between the labels and the underlying domain space, and we discuss our suggested notion by comparing it to other common data assumptions

University of Waterloo's Institutional Repository

Recommended from our members

Topics on Machine Learning under Imperfect Supervision

Author: Yuan Gan
Publication venue
Publication date: 01/01/2024
Field of study

This dissertation comprises several studies addressing supervised learning problems where the supervision is imperfect. Firstly, we investigate the margin conditions in active learning. Active learning is characterized by its special mechanism where the learner can sample freely over the feature space and exploit mostly the limited labeling budget by querying the most informative labels. Our primary focus is to discern critical conditions under which certain active learning algorithms can outperform the optimal passive learning minimax rate. Within a non-parametric multi-class classification framework,our results reveal that the uniqueness of Bayes labels across the feature space serves as the pivotal determinant for the superiority of active learning over passive learning. Secondly, we study the estimation of central mean subspace (CMS), and its application in transfer learning. We show that a fast parametric convergence rate is achievable via estimating the expected smoothed gradient outer product, for a general class of covariate distribution that admits Gaussian or heavier distributions. When the link function is a polynomial with a degree of at most r and the covariates follow the standard Gaussian, we show that the prefactor depends on the ambient dimension d as d^r. Furthermore, we show that under a transfer learning setting, an oracle rate of prediction error as if the CMS is known is achievable, when the source training data is abundant. Finally, we present an innovative application involving the utilization of weak (noisy) labels for addressing an Individual Tree Crown (ITC) segmentation challenge. Here, the objective is to delineate individual tree crowns within a 3D LiDAR scan of tropical forests, with only 2D noisy manual delineations of crowns on RGB images available as a source of weak supervision. We propose a refinement algorithm designed to enhance the performance of existing unsupervised learning methodologies for the ITC segmentation problem

Columbia University Academic Commons

Learning by Distances

Author: Blumer
Gale
Hoeffding
Valiant
Publication venue: 'Elsevier BV'
Publication date: 01/01/1990
Field of study

Crossref

Learning by Distances

Author: Bendavid S.
Itai A.
Kushilevitz E.
Publication venue: Academic Press.
Publication date: 31/03/1995
Field of study

AbstractA model of learning by distances is presented. In this model a concept is a point in a metric space. At each step of the learning process the student guesses a hypothesis and receives from the teacher an approximation of its distance to the target. A notion of a distance measuring the proximity of a hypothesis to the correct answer is common to many models of learnability. By focusing on this fundamental aspect we discover some general and simple tools for the analysis of learnability tasks. As a corollary we present new learning algorithms for Valiant′s PAC scenario with any given distribution. These algorithms can learn any PAC-learnable class and, in some cases, settle for significantly less information than the usual labeled examples. Insight gained by the new model is applied to show that every class of subsets C that has a finite VC-dimension is PAC-learnable with respect to any fixed distribution. Previously known results of this nature were subject to complicated measurability constraints

Elsevier - Publisher Connector