28 research outputs found

    General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting

    Get PDF
    AbstractWe derive general bounds on the complexity of learning in the statistical query (SQ) model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the SQ model. This new model was introduced by Kearns to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Since all SQ algorithms can be simulated in the PAC model with classification noise, we also obtain general upper bounds on learning in the presence of classification noise for classes which can be learned in the SQ model

    Specification and Simulation of Statistical Query Algorithms for Efficiency and Noise Tolerance

    Get PDF
    AbstractA recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the probably approximately correct (PAC) model, both in the absenceandin the presence of noise. However, simulations of SQ algorithms in the PAC model have non-optimal time and sample complexities. In this paper, we introduce a new method for specifying statistical query algorithms based on a type ofrelative errorand provide simulations in the noise-free and noise-tolerant PAC models which yield more efficient algorithms. Requests for estimates of statistics in this new model take the following form: ā€œReturn an estimate of the statistic within a 1Ā±Ī¼factor, or return āŠ„, promising that the statistic is less thanĪø.ā€ In addition to showing that this is a very natural language for specifying learning algorithms, we also show that this new specification is polynomially equivalent to standard SQ, and thus, known learnability and hardness results for statistical query learning are preserved. We then give highly efficient PAC simulations of relative error SQ algorithms. We show that the learning algorithms obtained by simulating efficient relative error SQ algorithms both in the absence of noise and in the presence of malicious noise have roughly optimal sample complexity. We also show that the simulation of efficient relative error SQ algorithms in the presence of classification noise yields learning algorithms at least as efficient as those obtained through standard methods, and in some cases improved, roughly optimal results are achieved. The sample complexities for all of these simulations are based on thedĪ½metric, which is a type of relative error metric useful for quantities which are small or even zero. We show that uniform convergence with respect to thedĪ½metric yields ā€œuniform convergenceā€ with respect to (Ī¼,Ā Īø) accuracy. Finally, while we show that manyspecificlearning algorithms can be written as highly efficient relative error SQ algorithms, we also show, in fact, thatallSQ algorithms can be written efficiently by proving general upper bounds on the complexity of (Ī¼,Ā Īø) queries as a function of the accuracy parameterĪµ. As a consequence of this result, we give general upper bounds on the complexity of learning algorithms achieved through the use of relative error SQ algorithms and the simulations described above

    Molecular modelling of the GIR1 branching ribozyme gives new insight into evolution of structurally related ribozymes

    Get PDF
    Twin-ribozyme introns contain a branching ribozyme (GIR1) followed by a homing endonuclease (HE) encoding sequence embedded in a peripheral domain of a group I splicing ribozyme (GIR2). GIR1 catalyses the formation of a lariat with 3 nt in the loop, which caps the HE mRNA. GIR1 is structurally related to group I ribozymes raising the question about how two closely related ribozymes can carry out very different reactions. Modelling of GIR1 based on new biochemical and mutational data shows an extended substrate domain containing a GoU pair distinct from the nucleophilic residue that dock onto a catalytic core showing a different topology from that of group I ribozymes. The differences include a core J8/7 region that has been reduced and is complemented by residues from the pre-lariat fold. These findings provide the basis for an evolutionary mechanism that accounts for the change from group I splicing ribozyme to the branching GIR1 architecture. Such an evolutionary mechanism can be applied to other large RNAs such as the ribonuclease P

    Protein Folding in the Generalized Hydrophobic-Polar Model on the Triangular Lattice

    No full text
    We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino sequence. The model we use is based on the Hydrophobic-Polar (HP) model [2] on cubic lattices in which the goal is to find the fold with the maximum number of contacts between non-covalently linked hydrophobic amino acids

    Learning in Hybrid Noise Environments Using Statistical Queries

    No full text
    We consider formal models of learning from noisy data. Specifically, we focus on learning in the probability approximately correct model as defined by Valiant. Two of the most widely studied models of noise in this setting have been classification noise and malicious errors. However, a more realistic model combining the two types of noise has not been formalized. We define a learning environment based on a natural combination of these two noise models. We first show that hypothesis testing is possible in this model. We next describe a simple technique for learning in this model, and then describe a more powerful technique based on statistical query learning. We show that the noise tolerance of this improved technique is roughly optimal with respect to the desired learning accuracy and that it provides a smooth tradeoff between the tolerable amounts of the two types of noise. Finally, we show that statistical query simulation yields learning algorithms for other combinations of noise m..

    On the Sample Complexity of Noise-Tolerant Learning

    No full text
    In this paper, we further characterize the complexity of noise-tolerant learning in the PAC model. Specifically, we show a general lower bound of\Omega \Gamma log(1=ffi) "(1\Gamma2j) 2 \Delta on the number of examples required for PAC learning in the presence of classification noise. Combined with a result of Simon, we effectively show that the sample complexity of PAC learning in the presence of classification noise is\Omega \Gamma VC(F) "(1\Gamma2j) 2 + log(1=ffi) "(1\Gamma2j) 2 \Delta : Furthermore, we demonstrate the optimality of the general lower bound by providing a noise-tolerant learning algorithm for the class of symmetric Boolean functions which uses a sample size within a constant factor of this bound. Finally, we note that our general lower bound compares favorably with various general upper bounds for PAC learning in the presence of classification noise. Keywords Machine Learning, Computational Learning Theory, Computational Complexity, Fault Tolerance, The..

    Specification and Simulation of Statistical Query Algorithms for Efficiency and Noise Tolerance

    No full text
    A recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the PAC model, both in the absence and in the presence of noise. However, simulations of SQ algorithms in the PAC model have non-optimal time and sample complexities. In this paper, we introduce a new method for specifying statistical query algorithms based on a type of relative error and provide simulations in the noise-free and noise-tolerant PAC models which yield more efficient algorithms. Requests for estimates of statistics in this new model take the form: "Return an estimate of the statistic within a 1 \Sigma factor, or return `?', promising that the statistic is less than `." In addition to showing that this is a very natural language for specifying learning algorithms, we also show that this new specification is polynomially equivalent to standard SQ, and thus, known learnability and hardness results for statistical query learning are preserved. We then give highly efficient PAC simulations of relative error SQ algorithms. We show that the learning algorithms obtained by simulating efficient relative error SQ algorithms in both the absence of noise and in the presence of malicious noise have roughly optimal sample complexity. We also show that the simulation of efficient relative error SQ algorithms in the presence of classification noise yield learning algorithms at least as efficient as those obtained through standard methods, and in some cases improved, roughly optimal results are achieved. The sample complexities for all of these simulations are based on the d metric which is a type of relative error metric useful for quantities which are small or even zero. We sho..
    corecore