133 research outputs found

    Kernel methods in machine learning

    Full text link
    We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data. We cover a wide range of methods, ranging from binary classifiers to sophisticated methods for estimation with structured data.Comment: Published in at http://dx.doi.org/10.1214/009053607000000677 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Decision Tree-based Syntactic Language Modeling

    Get PDF
    Statistical Language Modeling is an integral part of many natural language processing applications, such as Automatic Speech Recognition (ASR) and Machine Translation. N-gram language models dominate the field, despite having an extremely shallow view of language---a Markov chain of words. In this thesis, we develop and evaluate a joint language model that incorporates syntactic and lexical information in a effort to ``put language back into language modeling.'' Our main goal is to demonstrate that such a model is not only effective but can be made scalable and tractable. We utilize decision trees to tackle the problem of sparse parameter estimation which is exacerbated by the use of syntactic information jointly with word context. While decision trees have been previously applied to language modeling, there has been little analysis of factors affecting decision tree induction and probability estimation for language modeling. In this thesis, we analyze several aspects that affect decision tree-based language modeling, with an emphasis on syntactic language modeling. We then propose improvements to the decision tree induction algorithm based on our analysis, as well as the methods for constructing forest models---models consisting of multiple decision trees. Finally, we evaluate the impact of our syntactic language model on large scale Speech Recognition and Machine Translation tasks. In this thesis, we also address a number of engineering problems associated with the joint syntactic language model in order to make it tractable. Particularly, we propose a novel decoding algorithm that exploits the decision tree structure to eliminate unnecessary computation. We also propose and evaluate an approximation of our syntactic model by word n-grams---the approximation that makes it possible to incorporate our model directly into the CDEC Machine Translation decoder rather than using the model for rescoring hypotheses produced using an n-gram model

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Traits at Work: the design of a new trait-based stream library

    Get PDF
    International audienceRecent years saw the development of a composition mechanism called Traits. Traits are pure units of behavior that can be composed to form classes or other traits. The trait composition mechanism is an alternative to multiple or mixin inheritance in which the composer has full control over the trait composition. To evaluate the expressiveness of traits, some hierarchies were refactored, showing code reuse. However, such large refactorings, while valuable, may not exhibit all possible composition problems, since the hierarchies were previously expressed using single inheritance and following certain patterns. This paper presents our work on designing and implementing a new trait-based stream library named Nile. It evaluates how far traits enable reuse, what problems can be encountered when building a library using traits from scratch and compares the traits solution to alternative composition mechanisms. Nile's core allows the de?nition of compact collection and ?le streaming libraries as well as the implementation of a backward-compatible new stream library. Nile method size shows a reduction of 40% compared to the Squeak equivalent. The possibility to reuse the same set of traits to implement two distinct libraries is a concrete illustration of trait reuse capability

    Improving the Computational Efficiency of Training and Application of Neural Language Models for Automatic Speech Recognition

    Get PDF
    A language model is a vital component of automatic speech recognition systems. In recent years, advancements in neural network technologies have brought vast improvement in various machine learning tasks, including language modeling. However, compared to the conventional backoff n-gram models, neural networks require much greater computation power and cannot completely replace the conventional methods. In this work, we examine the pipeline of a typical hybrid speech recognition system. In a hybrid speech recognition system, the acoustic and language models are trained separately and used in conjunction. We propose ways to speed up the computation induced by the language model in various components. In the context of neural-network language modeling, we propose a new loss function that modifies the standard cross-entropy loss that trains the neural network to self-normalize, which we call a linear loss. The linear loss significantly reduces inference-time computation and allows us to use an importance-sampling based method in computing an unbiased estimator of the loss function during neural network training. We conduct extensive experiments comparing the linear loss and several commonly used self-normalizing loss functions and show linear loss's superiority. We also show that we can initialize with a well-trained language model trained with the cross-entropy loss and convert it into a self-normalizing linear loss system with minimal training. The trained system preserves the performance and also achieves the self-normalizing capability. We refine the sampling procedure for commonly used sampling-based approaches. We propose using a sampling-without-replacement scheme, which improves the model performance and allows a more efficient algorithm to be used to minimize the sampling overhead. We propose a speed-up of the algorithm that significantly reduces the sampling run-time while not affecting performance. We demonstrate that using the sampling-without-replacement scheme consistently outperforms traditional sampling-with-replacement methods across multiple training loss functions for language models. We also experiment with changing the sampling distribution for importance-sampling by utilizing longer histories. For batched training, we propose a method to generate the sampling distribution by averaging the n-gram distributions of the whole batch. Experiments show that including longer histories for sampling can help improve the rate of convergence and enhance the trained model's performance. To reduce the computational overhead with sampling from higher-order n-grams, we propose a 2-stage sampling algorithm that only adds a small overhead compared to the commonly used unigram-based sampling schemes. When applying a trained neural-network for lattice-rescoring for ASR, we propose a pruning algorithm that runs much faster than the standard algorithm and improves ASR performances. The methods proposed in this dissertation will make the application of neural language models in speech recognition significantly more computationally efficient. This allows researchers to apply larger and more sophisticated networks in their research and enable companies to provide better speech-based service to customers. Some of the methods proposed in this dissertation are not limited to neural language modeling and may facilitate neural network research in other fields

    A New Method IBE Interfaced with Private Key Generation and Public Key Infrastructure to Achieve High Data Security

    Get PDF
    A New Method IBE Interfaced with Private Key Generation and Public Key Infrastructure to Achieve High Data Securit
    corecore