4 research outputs found

    Hash kernels and structured learning

    Get PDF
    Vast amounts of data being generated, how to process massive data remains a challenge for machine learning algorithms. We propose hash kernels to facilitate efficient kernels which can deal with massive multi-class problems. We show a principled way to compute the kernel matrix for data streams and sparse feature spaces. We further generalise it via sampling to graphs. Later we exploit the connection between hash kernels with compressed sensing, and apply hashing to face recognition which significantly speeds up over the state-of-the-art with competitive accuracy. And we give a recovery rate on the sparse representation and a bounded recognition rate. As hash kernels can deal with data with structures in the input such as graphs and face images, the second part of the thesis moves on to an even more challenging task - dealing with data with structures in the output. Recent advances in machine learning exploit the dependency among data output, hence dealing with complex, structured data becomes possible. We study the most popular structured learning algorithms and categorise them into two categories - probabilistic approaches and Max Margin approaches. We show the connections of different algorithms, reformulate them in the empirical risk minimisation framework, and compare their advantages and disadvantages, which help choose suitable algorithms according to the characteristics of the application. We have made practical and theoretical contributions in this thesis. We show some real-world applications using structured learning as follows: a) We propose a novel approach for automatic paragraph segmentation, namely training Semi-Markov models discriminatively using a Max-Margin method. This method allows us to model the sequential nature of the problem and to incorporate features of a whole paragraph, such as paragraph coherence which cannot be used in previous models. b) We jointly segment and recognise actions in video sequences with a discriminative semi-Markov model framework, which incorporates features that capture the characteristics on boundary frames, action segments and neighbouring action segments. A Viterbi-like algorithm is devised to help efficiently solve the induced optimisation problem. c) We propose a novel hybrid loss of Conditional Random Fields (CRFs) and Support Vector Machines (SVMs). We apply the hybrid loss to various applications such as Text chunking, Named Entity Recognition and Joint Image Categorisation. We have made the following theoretical contributions: a) We study the recent advance in PAC-Bayes bounds, and apply it to structured learning. b) We propose a more refined notion of Fisher consistency, namely Conditional Fisher Consistency for Classification (CFCC), that conditions on the knowledge of the true distribution of class labels. c) We show that the hybrid loss has the advantages of both CRFs and SVMs - it is consistent and has a tight PAC-Bayes bound which shrinks as the margin increases. d) We also introduce Probabilistic margins which take the label distribution into account. And we show that many existing algorithms can be viewed as special cases of the new margin concept which may help understand existing algorithms as well as design new algorithms. At last, we discuss some future directions such as tightening PAC-Bayes bounds, adaptive hybrid losses and graphical model inference via Compressed Sensing

    The complexity of joint computation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 253-266).Joint computation is the ubiquitous scenario in which a computer is presented with not one, but many computational tasks to perform. A fundamental question arises: when can we cleverly combine computations, to perform them with greater efficiency or reliability than by tackling them separately? This thesis investigates the power and, especially, the limits of efficient joint computation, in several computational models: query algorithms, circuits, and Turing machines. We significantly improve and extend past results on limits to efficient joint computation for multiple independent tasks; identify barriers to progress towards better circuit lower bounds for multiple-output operators; and begin an original line of inquiry into the complexity of joint computation. In more detail, we make contributions in the following areas: Improved direct product theorems for randomized query complexity: The "direct product problem" seeks to understand how the difficulty of computing a function on each of k independent inputs scales with k. We prove the following direct product theorem (DPT) for query complexity: if every T-query algorithm has success probability at most 1-[epsilon] in computing the Boolean function f on input distribution [mu], then for [alpha] 0, the worst-case success probability of any [alpha]R₂(f)k-query randomized algorithm for f k falls exponentially with k. The best previous statement of this type, due to Klauck, Spalek, and de Wolf, required a query bound of O(bs(f)k). Our proof technique involves defining and analyzing a collection of martingales associated with an algorithm attempting to solve f*k. Our method is quite general and yields a new XOR lemma and threshold DPT for the query model, as well as DPTs for the query complexity of learning tasks, search problems, and tasks involving interaction with dynamic entities. We also give a version of our DPT in which decision tree size is the resource of interest. Joint complexity in the Decision Tree Model: We study the diversity of possible behaviors of the joint computational complexity of a collection f1,... , fk of Boolean functions over a shared input. We focus on the deterministic decision tree model, with depth as the complexity measure; in this model, we prove a result to the effect that the "obvious" constraints on joint computational complexity are essentially the only ones. The proof uses an intriguing new type of cryptographic data structure called a "mystery bin," which we construct using a polynomial separation between deterministic and unambiguous query complexity shown by Savický. We also pose a conjecture in the communication model which, if proved, would extend our result to that model. Limitations of Lower-Bound Methods for the Wire Complexity of Boolean Operators: We study the circuit complexity of Boolean operators, i.e., collections of Boolean functions defined over a common input. Our focus is the well-studied model in which arbitrary Boolean functions are allowed as gates, and in which a circuit's complexity is measured by its depth and number of wires. We show sharp limitations of several existing lower-bound methods for this model. First, we study an information-theoretic lower-bound method due to Cherukhin, which gave the first improvement over the lower bounds provided by the well-known superconcentrator technique for constant depths. (The lower bounds are still barelysuperlinear, however) Cherukhin's method was formalized by Jukna as a general lower-bound criterion for Boolean operators, the "Strong Multiscale Entropy" (SME) property. It seemed plausible that this property could imply significantly better lower bounds by an improved analysis. However, we show that this is not the case, by exhibiting an explicit operator with the SME property that is computable in constant depths whose wire-complexity essentially matches the Cherukhin-Jukna lower bound (to within a constant multiplicative factor, for depths d = 2,3 and for even depths d >/= 6). Next, we show limitations of two simpler lower-bound criteria given by Jukna: the "entropy method" for general operators, and the "pairwise-distance method" for linear operators. We show that neither method gives super-linear lower bounds for depth 3. In the process, we obtain the first known polynomial separation between the depth-2 and depth-3 wire complexities for an explicit operator. We also continue the study (initiated by Jukna) of the complexity of "representing" a linear operator by bounded-depth circuits, a weaker notion than computing the operator. New limits to classical and quantum instance compression: Given an instance of a decision problem that is too difficult to solve outright, we may aim for the more limited goal of compressing that instance into a smaller, equivalent instance of the same or a different problem. As a representative problem, say we are given Boolean formulas [psi]1,... ,[psi]t, each of length n << t, and we want to determine if at least one [psi]j is satisfiable. Can we efficiently reduce this "OR-SAT" question to an equivalent problem instance (of SAT or another problem) of size poly(n), independent of t? We call any such reduction a "strong compression" reduction for OR-SAT. This would amount to a major gain from compressing [psi]1,. .. , [psi]t jointly, since we know of no way to reliably compress an individual SAT instance. Harnik and Naor (FOCS '06/SICOMP '10) and Bodlaender, Downey, Fellows, and Hermelin (ICALP '08/JCSS '09) showed that the infeasibility of strong compression for OR-SAT would also imply limits to instance compression schemes for a large number of other, natural problems; this is significant because instance compression is a central technique in the design of so-called fixed-parameter tractable algorithms. Bodlaender et al. also showed that the infeasibility of strong compression for the analogous "AND-SAT" problem would establish limits to instance compression for another family of problems. Fortnow and Santhanam (STOC '08) showed that deterministic (or 1-sided error randomized) strong compression for OR-SAT is not possible unless NP C coNP/poly; the case of AND-SAT remained mysterious. We give new and improved evidence against strong compression schemes for both OR-SAT and AND-SAT; our method applies to probabilistic compression schemes with 2-sided error. We also give versions of these results for an analogous task of quantum instance compression, in which a polynomial-time quantum reduction must output a quantum state that, in an appropriate sense, "preserves the answer" to the input instance. We give quantitatively similar evidence against strong compression for AND- and OR-SAT in this setting, albeit under less well-studied hypotheses about the relationship between NP and quantum complexity classes. To prove all of these results, we exploit the information bottleneck of an instance compression scheme, using a new method to "disguise" information being fed into a compressive mapping.by Andrew Donald Drucker.Ph.D

    Universal semantic communication

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 325-334).Is meaningful communication possible between two intelligent parties who share no common language or background? We propose that this problem can be rigorously addressed by explicitly focusing on the goals of the communication. We propose a theoretical framework in which we can address when and to what extent such semantic communication is possible. Our starting point is a mathematical definition of a generic goal for communication, that is pursued by agents of bounded computational complexity. We then model a "lack of common language or background" by considering a class of potential partners for communication; in general, this formalism is rich enough to handle varying degrees of common language and backgrounds, but the complete lack of knowledge is modeled by simply considering the class of all partners with which some agent of similar power could achieve our goal. In this formalism, we will find that for many goals (but not all), communication without any common language or background is possible. We call the strategies for achieving goals without relying on such background universal protocols. The main intermediate notions introduced by our theory are formal notions of feedback that we call sensing. We show that sensing captures the essence of whether or not reliable universal protocols can be constructed in many natural settings of interest: we find that across settings, sensing is almost always sufficient, usually necessary, and generally a useful design principle for the construction of universal protocols. We support this last point by developing a number of examples of protocols for specific goals. Notably, we show that universal delegation of computation from a space-efficient client to a general-purpose server is possible, and we show how a variant of TCP can allow end-users on a packet network to automatically adapt to small changes in the packet format (e.g., changes in IP). The latter example above alludes to our main motivation for considering such problems, which is to develop techniques for modeling and constructing computer systems that do not require that their components strictly adhere to protocols: said differently, we hope to be able to design components that function properly with a sufficiently wide range of other components to permit a rich space of "backwards-compatible" designs for those components. We expect that in the long run, this paradigm will lead to simpler systems because "backwards compatibility" is no longer such a severe constraint, and we expect it to lead to more robust systems, partially because the components should be simpler, and partially because such components are inherently robust to deviations from any fixed protocol. Unfortunately, we find that the techniques for communication under the complete absence of any common background suffer from overhead that is too severe for such practical purposes, so we consider two natural approaches for introducing some assumed common background between components while retaining some nontrivial amount of flexibility. The first approach supposes that the designer of a component has some "belief" about what protocols would be "natural" to use to interact with other components; we show that, given sensing and some sufficient "agreement" between the beliefs of the designers of two components, the components can be made universal with some relatively modest overhead. The second approach supposes that the protocols are taken from some restricted class of functions, and we will see that for certain classes of functions and simple goals, efficient universal protocols can again be constructed from sensing. Actually, we show more: the special case of our model described in the second approach above corresponds precisely to the well-known model of mistake-bounded on-line learning first studied by Barzdirs and Frievalds, and later considered in more depth by Littlestone. This connection provides a reasonably complete picture of the conditions under which we can apply the second approach. Furthermore, it also seems that the first approach is closely related to the problem of designing good user interfaces in Human-Computer Interaction. We conclude by briefly sketching the connection, and suggest that further development of this connection may be a potentially fruitful direction for future work.by Brendan Juba.Ph.D

    A novel application of Hoeffding's inequality to decision trees construction for data streams

    No full text
    corecore