Search CORE

28 research outputs found

General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting

Author: Aslam Javed A.
Decatur Scott E.
Publication venue: Academic Press.
Publication date: 15/03/1998
Field of study

AbstractWe derive general bounds on the complexity of learning in the statistical query (SQ) model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the SQ model. This new model was introduced by Kearns to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Since all SQ algorithms can be simulated in the PAC model with classification noise, we also obtain general upper bounds on learning in the presence of classification noise for classes which can be learned in the SQ model

Elsevier - Publisher Connector

Specification and Simulation of Statistical Query Algorithms for Efficiency and Noise Tolerance

Author: Aslam Javed A
Decatur Scott E
Publication venue: Academic Press.
Publication date: 30/04/1998
Field of study

AbstractA recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the probably approximately correct (PAC) model, both in the absenceandin the presence of noise. However, simulations of SQ algorithms in the PAC model have non-optimal time and sample complexities. In this paper, we introduce a new method for specifying statistical query algorithms based on a type ofrelative errorand provide simulations in the noise-free and noise-tolerant PAC models which yield more efficient algorithms. Requests for estimates of statistics in this new model take the following form: “Return an estimate of the statistic within a 1±μfactor, or return ⊥, promising that the statistic is less thanθ.” In addition to showing that this is a very natural language for specifying learning algorithms, we also show that this new specification is polynomially equivalent to standard SQ, and thus, known learnability and hardness results for statistical query learning are preserved. We then give highly efficient PAC simulations of relative error SQ algorithms. We show that the learning algorithms obtained by simulating efficient relative error SQ algorithms both in the absence of noise and in the presence of malicious noise have roughly optimal sample complexity. We also show that the simulation of efficient relative error SQ algorithms in the presence of classification noise yields learning algorithms at least as efficient as those obtained through standard methods, and in some cases improved, roughly optimal results are achieved. The sample complexities for all of these simulations are based on thedνmetric, which is a type of relative error metric useful for quantities which are small or even zero. We show that uniform convergence with respect to thedνmetric yields “uniform convergence” with respect to (μ, θ) accuracy. Finally, while we show that manyspecificlearning algorithms can be written as highly efficient relative error SQ algorithms, we also show, in fact, thatallSQ algorithms can be written efficiently by proving general upper bounds on the complexity of (μ, θ) queries as a function of the accuracy parameterε. As a consequence of this result, we give general upper bounds on the complexity of learning algorithms achieved through the use of relative error SQ algorithms and the simulations described above

Elsevier - Publisher Connector

Recommended from our members

Improved Noise-Tolerant Learning and Generalized Statistical Queries

Author: Aslam Javed A.
Decatur Scott E.
Publication venue
Publication date: 04/03/2016
Field of study

The statistical query learning model can be viewed as a tool for creating (or demonstrating the existence of ) noise-tolerant learning algorithms in the PAC model. The complexity of a statistical query algorithm, in conjunction with the complexity of simulating SQ algorithms in the PAC model with noise, determine the complexity of the noise-tolerant PAC algorithms produced. Although roughly optimal upper bounds have been shown for the complexity of statistical query learning, the corresponding noise-tolerant PAC algorithms are not optimal due to inefficient simulations. In this paper we provide both improved simulations and a new variant of the statistical query model in order to overcome these inefficiencies. We improve the time complexity of the classification noise simulation of statistical query algorithms. Our new simulation has a roughly optimal dependence on the noise rate. We also derive a simpler proof that statistical queries can be simulated in the presence of classification noise. This proof makes fewer assumptions on the queries themselves and therefore allows one to simulate more general types of queries. We also define a new variant of the statistical query model based on relative error, and we show that this variant is more natural and strictly more powerful than the standard additive error model. We demonstrate efficient PAC simulations for algorithms in this new model and give general upper bounds on both learning with relative error statistical queries and PAC simulation. We show that any statistical query algorithm can be simulated in the PAC model with malicious errors in such a way that the resultant PAC algorithm has a roughly optimal tolerable malicious error rate and sample complexity. Finally, we generalize the types of queries allowed in the statistical query model. We discuss the advantages of allowing these generalized queries and show that our results on improved simulations also hold for these queries.Engineering and Applied Science

Harvard University - DASH

Molecular modelling of the GIR1 branching ribozyme gives new insight into evolution of structurally related ribozymes

Author: Adams PL
Adams PL
Agback P
Benoît Masquida
Bertrand Beckert
Bhattacharya D
Christer Einvik
Copertino DW
Costa M
Darr SC
Darr SC
Decatur WA
Ditzler MA
Doherty EA
Einvik C
Einvik C
Einvik C
Einvik C
Eric Westhof
Goddard MR
Golden BL
Guo F
Haugen P
Henrik Nielsen
Jabri E
Jabri E
Johansen S
Johansen SD
Jossinet F
Leontis NB
Lescoute A
Massire C
Michel F
Nielsen H
Nielsen H
Nissen P
Pan J
Pyle AM
Rangan P
Salvo JL
Schultes EA
Scott WG
Serganov A
Soukup JK
Steinar D Johansen
Storici F
Strauss-Soukup JK
Strobel SA
Tanner MA
Vader A
Waldsich C
Wang JF
Westhof E
Wikmark O-G
Publication venue: Nature Publishing Group
Publication date: 01/01/2008
Field of study

Twin-ribozyme introns contain a branching ribozyme (GIR1) followed by a homing endonuclease (HE) encoding sequence embedded in a peripheral domain of a group I splicing ribozyme (GIR2). GIR1 catalyses the formation of a lariat with 3 nt in the loop, which caps the HE mRNA. GIR1 is structurally related to group I ribozymes raising the question about how two closely related ribozymes can carry out very different reactions. Modelling of GIR1 based on new biochemical and mutational data shows an extended substrate domain containing a GoU pair distinct from the nucleophilic residue that dock onto a catalytic core showing a different topology from that of group I ribozymes. The differences include a core J8/7 region that has been reduced and is complemented by residues from the pre-lariat fold. These findings provide the basis for an evolutionary mechanism that accounts for the change from group I splicing ribozyme to the branching GIR1 architecture. Such an evolutionary mechanism can be applied to other large RNAs such as the ribonuclease P

Crossref

PubMed Central

Copenhagen University Research Information System

Protein Folding in the Generalized Hydrophobic-Polar Model on the Triangular Lattice

Author: Decatur Scott E.
Publication venue
Publication date: 01/05/1996
Field of study

We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino sequence. The model we use is based on the Hydrophobic-Polar (HP) model [2] on cubic lattices in which the goal is to find the fold with the maximum number of contacts between non-covalently linked hydrophobic amino acids

DSpace@MIT

Learning in Hybrid Noise Environments Using Statistical Queries

Author: Scott E. Decatur
Publication venue: Springer Verlag
Publication date
Field of study

We consider formal models of learning from noisy data. Specifically, we focus on learning in the probability approximately correct model as defined by Valiant. Two of the most widely studied models of noise in this setting have been classification noise and malicious errors. However, a more realistic model combining the two types of noise has not been formalized. We define a learning environment based on a natural combination of these two noise models. We first show that hypothesis testing is possible in this model. We next describe a simple technique for learning in this model, and then describe a more powerful technique based on statistical query learning. We show that the noise tolerance of this improved technique is roughly optimal with respect to the desired learning accuracy and that it provides a smooth tradeoff between the tolerable amounts of the two types of noise. Finally, we show that statistical query simulation yields learning algorithms for other combinations of noise m..

CiteSeerX

On the Sample Complexity of Noise-Tolerant Learning

Author: Javed A. Aslam
Scott E. Decatur
Publication venue
Publication date: 01/01/1996
Field of study

In this paper, we further characterize the complexity of noise-tolerant learning in the PAC model. Specifically, we show a general lower bound of\Omega \Gamma log(1=ffi) "(1\Gamma2j) 2 \Delta on the number of examples required for PAC learning in the presence of classification noise. Combined with a result of Simon, we effectively show that the sample complexity of PAC learning in the presence of classification noise is\Omega \Gamma VC(F) "(1\Gamma2j) 2 + log(1=ffi) "(1\Gamma2j) 2 \Delta : Furthermore, we demonstrate the optimality of the general lower bound by providing a noise-tolerant learning algorithm for the class of symmetric Boolean functions which uses a sample size within a constant factor of this bound. Finally, we note that our general lower bound compares favorably with various general upper bounds for PAC learning in the presence of classification noise. Keywords Machine Learning, Computational Learning Theory, Computational Complexity, Fault Tolerance, The..

CiteSeerX

Specification and Simulation of Statistical Query Algorithms for Efficiency and Noise Tolerance

Author: Javed A. Aslam
Scott E. Decatur
Publication venue: ACM Press
Publication date: 01/01/1995
Field of study

A recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the PAC model, both in the absence and in the presence of noise. However, simulations of SQ algorithms in the PAC model have non-optimal time and sample complexities. In this paper, we introduce a new method for specifying statistical query algorithms based on a type of relative error and provide simulations in the noise-free and noise-tolerant PAC models which yield more efficient algorithms. Requests for estimates of statistics in this new model take the form: "Return an estimate of the statistic within a 1 \Sigma factor, or return `?', promising that the statistic is less than `." In addition to showing that this is a very natural language for specifying learning algorithms, we also show that this new specification is polynomially equivalent to standard SQ, and thus, known learnability and hardness results for statistical query learning are preserved. We then give highly efficient PAC simulations of relative error SQ algorithms. We show that the learning algorithms obtained by simulating efficient relative error SQ algorithms in both the absence of noise and in the presence of malicious noise have roughly optimal sample complexity. We also show that the simulation of efficient relative error SQ algorithms in the presence of classification noise yield learning algorithms at least as efficient as those obtained through standard methods, and in some cases improved, roughly optimal results are achieved. The sample complexities for all of these simulations are based on the d metric which is a type of relative error metric useful for quantities which are small or even zero. We sho..

CiteSeerX

Crossref