9 research outputs found

    Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

    Full text link
    The paper studies machine learning problems where each example is described using a set of Boolean features and where hypotheses are represented by linear threshold elements. One method of increasing the expressiveness of learned hypotheses in this context is to expand the feature set to include conjunctions of basic features. This can be done explicitly or where possible by using a kernel function. Focusing on the well known Perceptron and Winnow algorithms, the paper demonstrates a tradeoff between the computational efficiency with which the algorithm can be run over the expanded feature space and the generalization ability of the corresponding learning algorithm. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithm over a feature space of exponentially many conjunctions; however we also show that using such kernels, the Perceptron algorithm can provably make an exponential number of mistakes even when learning simple functions. We then consider the question of whether kernel functions can analogously be used to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. Known upper bounds imply that the Winnow algorithm can learn Disjunctive Normal Form (DNF) formulae with a polynomial mistake bound in this setting. However, we prove that it is computationally hard to simulate Winnows behavior for learning DNF over such a feature set. This implies that the kernel functions which correspond to running Winnow for this problem are not efficiently computable, and that there is no general construction that can run Winnow with kernels

    Learning Unions of ω(1)\omega(1)-Dimensional Rectangles

    Get PDF
    We consider the problem of learning unions of rectangles over the domain [b]n[b]^n, in the uniform distribution membership query learning setting, where both b and n are "large". We obtain poly(n,logb)(n, \log b)-time algorithms for the following classes: - poly(nlogb)(n \log b)-way Majority of O(log(nlogb)loglog(nlogb))O(\frac{\log(n \log b)} {\log \log(n \log b)})-dimensional rectangles. - Union of poly(log(nlogb))(\log(n \log b)) many O(log2(nlogb)(loglog(nlogb)logloglog(nlogb))2)O(\frac{\log^2 (n \log b)} {(\log \log(n \log b) \log \log \log (n \log b))^2})-dimensional rectangles. - poly(nlogb)(n \log b)-way Majority of poly(nlogb)(n \log b)-Or of disjoint O(log(nlogb)loglog(nlogb))O(\frac{\log(n \log b)} {\log \log(n \log b)})-dimensional rectangles. Our main algorithmic tool is an extension of Jackson's boosting- and Fourier-based Harmonic Sieve algorithm [Jackson 1997] to the domain [b]n[b]^n, building on work of [Akavia, Goldwasser, Safra 2003]. Other ingredients used to obtain the results stated above are techniques from exact learning [Beimel, Kushilevitz 1998] and ideas from recent work on learning augmented AC0AC^{0} circuits [Jackson, Klivans, Servedio 2002] and on representing Boolean functions as thresholds of parities [Klivans, Servedio 2001].Comment: 25 pages. Some corrections. Recipient of E. M. Gold award ALT 2006. To appear in Journal of Theoretical Computer Scienc

    The perceptron algorithm versus winnow: linear versus logarithmic mistake bounds when few input variables are relevant

    Get PDF
    AbstractWe give an adversary strategy that forces the Perceptron algorithm to make Ω(kN) mistakes in learning monotone disjunctions over N variables with at most k literals. In contrast, Littlestone's algorithm Winnow makes at most O(k log N) mistakes for the same problem. Both algorithms use thresholded linear functions as their hypotheses. However, Winnow does multiplicative updates to its weight vector instead of the additive updates of the Perceptron algorithm. In general, we call an algorithm additive if its weight vector is always a sum of a fixed initial weight vector and some linear combination of already seen instances. Thus, the Perceptron algorithm is an example of an additive algorithm. We show that an adversary can force any additive algorithm to make (N + k −1)2 mistakes in learning a monotone disjunction of at most k literals. Simple experiments show that for k ⪡ N, Winnow clearly outperforms the Perceptron algorithm also on nonadversarial random data

    Formal concept matching and reinforcement learning in adaptive information retrieval

    Get PDF
    The superiority of the human brain in information retrieval (IR) tasks seems to come firstly from its ability to read and understand the concepts, ideas or meanings central to documents, in order to reason out the usefulness of documents to information needs, and secondly from its ability to learn from experience and be adaptive to the environment. In this work we attempt to incorporate these properties into the development of an IR model to improve document retrieval. We investigate the applicability of concept lattices, which are based on the theory of Formal Concept Analysis (FCA), to the representation of documents. This allows the use of more elegant representation units, as opposed to keywords, in order to better capture concepts/ideas expressed in natural language text. We also investigate the use of a reinforcement leaming strategy to learn and improve document representations, based on the information present in query statements and user relevance feedback. Features or concepts of each document/query, formulated using FCA, are weighted separately with respect to the documents they are in, and organised into separate concept lattices according to a subsumption relation. Furthen-nore, each concept lattice is encoded in a two-layer neural network structure known as a Bidirectional Associative Memory (BAM), for efficient manipulation of the concepts in the lattice representation. This avoids implementation drawbacks faced by other FCA-based approaches. Retrieval of a document for an information need is based on concept matching between concept lattice representations of a document and a query. The learning strategy works by making the similarity of relevant documents stronger and non-relevant documents weaker for each query, depending on the relevance judgements of the users on retrieved documents. Our approach is radically different to existing FCA-based approaches in the following respects: concept formulation; weight assignment to object-attribute pairs; the representation of each document in a separate concept lattice; and encoding concept lattices in BAM structures. Furthermore, in contrast to the traditional relevance feedback mechanism, our learning strategy makes use of relevance feedback information to enhance document representations, thus making the document representations dynamic and adaptive to the user interactions. The results obtained on the CISI, CACM and ASLIB Cranfield collections are presented and compared with published results. In particular, the performance of the system is shown to improve significantly as the system learns from experience.The School of Computing, University of Plymouth, UK

    Universal semantic communication

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 325-334).Is meaningful communication possible between two intelligent parties who share no common language or background? We propose that this problem can be rigorously addressed by explicitly focusing on the goals of the communication. We propose a theoretical framework in which we can address when and to what extent such semantic communication is possible. Our starting point is a mathematical definition of a generic goal for communication, that is pursued by agents of bounded computational complexity. We then model a "lack of common language or background" by considering a class of potential partners for communication; in general, this formalism is rich enough to handle varying degrees of common language and backgrounds, but the complete lack of knowledge is modeled by simply considering the class of all partners with which some agent of similar power could achieve our goal. In this formalism, we will find that for many goals (but not all), communication without any common language or background is possible. We call the strategies for achieving goals without relying on such background universal protocols. The main intermediate notions introduced by our theory are formal notions of feedback that we call sensing. We show that sensing captures the essence of whether or not reliable universal protocols can be constructed in many natural settings of interest: we find that across settings, sensing is almost always sufficient, usually necessary, and generally a useful design principle for the construction of universal protocols. We support this last point by developing a number of examples of protocols for specific goals. Notably, we show that universal delegation of computation from a space-efficient client to a general-purpose server is possible, and we show how a variant of TCP can allow end-users on a packet network to automatically adapt to small changes in the packet format (e.g., changes in IP). The latter example above alludes to our main motivation for considering such problems, which is to develop techniques for modeling and constructing computer systems that do not require that their components strictly adhere to protocols: said differently, we hope to be able to design components that function properly with a sufficiently wide range of other components to permit a rich space of "backwards-compatible" designs for those components. We expect that in the long run, this paradigm will lead to simpler systems because "backwards compatibility" is no longer such a severe constraint, and we expect it to lead to more robust systems, partially because the components should be simpler, and partially because such components are inherently robust to deviations from any fixed protocol. Unfortunately, we find that the techniques for communication under the complete absence of any common background suffer from overhead that is too severe for such practical purposes, so we consider two natural approaches for introducing some assumed common background between components while retaining some nontrivial amount of flexibility. The first approach supposes that the designer of a component has some "belief" about what protocols would be "natural" to use to interact with other components; we show that, given sensing and some sufficient "agreement" between the beliefs of the designers of two components, the components can be made universal with some relatively modest overhead. The second approach supposes that the protocols are taken from some restricted class of functions, and we will see that for certain classes of functions and simple goals, efficient universal protocols can again be constructed from sensing. Actually, we show more: the special case of our model described in the second approach above corresponds precisely to the well-known model of mistake-bounded on-line learning first studied by Barzdirs and Frievalds, and later considered in more depth by Littlestone. This connection provides a reasonably complete picture of the conditions under which we can apply the second approach. Furthermore, it also seems that the first approach is closely related to the problem of designing good user interfaces in Human-Computer Interaction. We conclude by briefly sketching the connection, and suggest that further development of this connection may be a potentially fruitful direction for future work.by Brendan Juba.Ph.D

    Efficient Learning with Virtual Threshold Gates

    Get PDF
    We reduce learning simple geometric concept classes to learning disjunctions over exponentially many variables. We then apply an on-line algorithm called Winnow whose number of prediction mistakes grows only logarithmically with the number of variables. The hypotheses of Winnow are linear threshold functions with one weight per variable. We find ways to keep the exponentially many weights of Winnow implicitly so that the time for the algorithm to compute a prediction and update its "virtual" weights is polynomial. Our method can be used to learn d-dimensional axis-parallel boxes when d is variable, and unions of d-dimensional axis-parallel boxes when d is constant. The worst-case number of mistakes of our algorithms for the above classes is optimal to within a constant factor, and our algorithms inherit the noise robustness of Winnow. We think that other on-line algorithms with multiplicative weight updates whose loss bounds grow logarithmically with the dimension are amenab..

    Efficient Learning with Virtual Threshold Gates

    No full text
    this paper is to reduce learning particular concept classes to the case of learning disjunctions or more generally linear threshold functions over exponentially many variables. Then the algorithm Winnow [Lit88] is applied which learns for example k-literal monotone disjunctions over v variables with a mistake bound of O(k + k log(v=k)). This bound is optimal to within a constant factor since the Vapnik-Chervonenkis dimension [VC71, BEHW89] of the class of k-literal monotone disjunctions is \Omega\Gamma k+k log(v=k)) [Lit88] and this dimension is always a lower bound for the optimal mistake bound

    Efficient Learning with Virtual Threshold Gates

    No full text
    this paper is to reduce learning particular concept classes to the case of learning disjunctions or more generally linear threshold functions over exponentially many variables. Then the algorithm Winnow [Lit88] is applied which learns for example k-litera
    corecore