7 research outputs found

    Hash kernels and structured learning

    Get PDF
    Vast amounts of data being generated, how to process massive data remains a challenge for machine learning algorithms. We propose hash kernels to facilitate efficient kernels which can deal with massive multi-class problems. We show a principled way to compute the kernel matrix for data streams and sparse feature spaces. We further generalise it via sampling to graphs. Later we exploit the connection between hash kernels with compressed sensing, and apply hashing to face recognition which significantly speeds up over the state-of-the-art with competitive accuracy. And we give a recovery rate on the sparse representation and a bounded recognition rate. As hash kernels can deal with data with structures in the input such as graphs and face images, the second part of the thesis moves on to an even more challenging task - dealing with data with structures in the output. Recent advances in machine learning exploit the dependency among data output, hence dealing with complex, structured data becomes possible. We study the most popular structured learning algorithms and categorise them into two categories - probabilistic approaches and Max Margin approaches. We show the connections of different algorithms, reformulate them in the empirical risk minimisation framework, and compare their advantages and disadvantages, which help choose suitable algorithms according to the characteristics of the application. We have made practical and theoretical contributions in this thesis. We show some real-world applications using structured learning as follows: a) We propose a novel approach for automatic paragraph segmentation, namely training Semi-Markov models discriminatively using a Max-Margin method. This method allows us to model the sequential nature of the problem and to incorporate features of a whole paragraph, such as paragraph coherence which cannot be used in previous models. b) We jointly segment and recognise actions in video sequences with a discriminative semi-Markov model framework, which incorporates features that capture the characteristics on boundary frames, action segments and neighbouring action segments. A Viterbi-like algorithm is devised to help efficiently solve the induced optimisation problem. c) We propose a novel hybrid loss of Conditional Random Fields (CRFs) and Support Vector Machines (SVMs). We apply the hybrid loss to various applications such as Text chunking, Named Entity Recognition and Joint Image Categorisation. We have made the following theoretical contributions: a) We study the recent advance in PAC-Bayes bounds, and apply it to structured learning. b) We propose a more refined notion of Fisher consistency, namely Conditional Fisher Consistency for Classification (CFCC), that conditions on the knowledge of the true distribution of class labels. c) We show that the hybrid loss has the advantages of both CRFs and SVMs - it is consistent and has a tight PAC-Bayes bound which shrinks as the margin increases. d) We also introduce Probabilistic margins which take the label distribution into account. And we show that many existing algorithms can be viewed as special cases of the new margin concept which may help understand existing algorithms as well as design new algorithms. At last, we discuss some future directions such as tightening PAC-Bayes bounds, adaptive hybrid losses and graphical model inference via Compressed Sensing

    Understanding Language Evolution in Overlapping Generations of Reinforcement Learning Agents

    Get PDF
    corecore