449 research outputs found

    Transfer Learning for Speech and Language Processing

    Full text link
    Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

    Efficient Methods for the Design and Training of Neural Networks

    Get PDF
    The field of artificial intelligence has seen significant advancements with the development of neural networks, which have numerous applications in computer vision, natural language processing, and speech processing. Despite these advancements, designing and training these networks still pose numerous challenges. This thesis aims to address two critical aspects of neural network development, design and training, within the context of computer vision tasks. The thesis focuses on three main challenges in the development of neural networks. The first challenge is finding an efficient way to perform architecture search in an extremely large or even unlimited search space. To address this challenge, the thesis proposes a Neural Search-space Evolution (NSE) scheme that enables efficient and effective architecture search in large-scale search spaces. The second challenge is to improve the efficiency of self-supervised learning for model pretraining. To address this challenge, the thesis proposes a combinatorial patches approach that significantly improves the efficiency of self-supervised learning. The third challenge is to develop an efficient and versatile multitask model that can leverage the benefits of large-scale multitask training. To address this challenge, the thesis proposes a Unified model for Human-Centric Perceptions (UniHCP) as a simple and scalable solution for a human-centric perception system that unifies multiple human-centric tasks into a neat, efficient, and scalable model. The results of this thesis demonstrate the effectiveness of the proposed methods in improving the practicality and performance of neural network design and training. The NSE scheme, combinatorial patches approach, and UniHCP have been tested on a broad range of datasets, tasks, and settings, yielding impressive results. These findings affirm the efficacy of the proposed methods in enhancing the efficiency of the design and training process of neural networks

    Learning task-specific bilexical embeddings

    No full text
    We present a method that learns bilexical operators over distributional representations of words and leverages supervised data for a linguistic relation. The learning algorithm exploits lowrank bilinear forms and induces low-dimensional embeddings of the lexical space tailored for the target linguistic relation. An advantage of imposing low-rank constraints is that prediction is expressed as the inner-product between low-dimensional embeddings, which can have great computational benefits. In experiments with multiple linguistic bilexical relations we show that our method effectively learns using embeddings of a few dimensions.Peer ReviewedPostprint (published version

    Differentially Private and Fair Deep Learning: A Lagrangian Dual Approach

    Full text link
    A critical concern in data-driven decision making is to build models whose outcomes do not discriminate against some demographic groups, including gender, ethnicity, or age. To ensure non-discrimination in learning tasks, knowledge of the sensitive attributes is essential, while, in practice, these attributes may not be available due to legal and ethical requirements. To address this challenge, this paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors. The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints while guaranteeing the privacy of sensitive attributes. The paper analyses the tension between accuracy, privacy, and fairness and the experimental evaluation illustrates the benefits of the proposed model on several prediction tasks

    Joint Semantic and Latent Attribute Modelling for Cross-Class Transfer Learning

    Get PDF
    This work is partially supported by grants from the National Natural Science Foundation of China under contract No. 61390515, No. U1611461, and No. 61425025, and the National Basic Research Program of China under Grant No. 2015CB351806

    Regularized algorithms for ranking, and manifold learning for related tasks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (leaves 119-127).This thesis describes an investigation of regularized algorithms for ranking problems for user preferences and information retrieval problems. We utilize regularized manifold algorithms to appropriately incorporate data from related tasks. This investigation was inspired by personalization challenges in both user preference and information retrieval ranking problems. We formulate the ranking problem of related tasks as a special case of semi-supervised learning. We examine how to incorporate instances from related tasks, with the appropriate penalty in the loss function to optimize performance on the hold out sets. We present a regularized manifold approach that allows us to learn a distance metric for the different instances directly from the data. This approach allows incorporation of information from related task examples, without prior estimation of cross-task coefficient covariances. We also present applications of ranking problems in two text analysis problems: a) Supervise content-word learning, and b) Company Entity matching for record linkage problems.by Giorgos Zacharia.Ph.D

    Formality Style Transfer Within and Across Languages with Limited Supervision

    Get PDF
    While much natural language processing work focuses on analyzing language content, language style also conveys important information about the situational context and purpose of communication. When editing an article, professional editors take into account the target audience to select appropriate word choice and grammar. Similarly, professional translators translate documents for a specific audience and often ask what is the expected tone of the content when taking a translation job. Computational models of natural language should consider both their meaning and style. Controlling style is an emerging research area in text rewriting and is under-investigated in machine translation. In this dissertation, we present a new perspective which closely connects formality transfer and machine translation: we aim to control style in language generation with a focus on rewriting English or translating French to English with a desired formality. These are challenging tasks because annotated examples of style transfer are only available in limited quantities. We first address this problem by inducing a lexical formality model based on word embeddings and a small number of representative formal and informal words. This enables us to assign sentential formality scores and rerank translation hypotheses whose formality scores are closer to user-provided formality level. To capture broader formality changes, we then turn to neural sequence to sequence models. Joint modeling of formality transfer and machine translation enables formality control in machine translation without dedicated training examples. Along the way, we also improve low-resource neural machine translation
    • …
    corecore