186 research outputs found

    Towards zero-shot language modeling

    Get PDF
    Can we construct a neural language model which is inductively biased towards learning human language? Motivated by this question, we aim at constructing an informative prior for held-out languages on the task of character-level, open-vocabulary language modeling. We obtain this prior as the posterior over network weights conditioned on the data from a sample of training languages, which is approximated through Laplace’s method. Based on a large and diverse sample of languages, the use of our prior outperforms baseline models with an uninformative prior in both zero-shot and few-shot settings, showing that the prior is imbued with universal linguistic knowledge. Moreover, we harness broad language-specific information available for most languages of the world, i.e., features from typological databases, as distant supervision for held-out languages. We explore several language modeling conditioning techniques, including concatenation and meta-networks for parameter generation. They appear beneficial in the few-shot setting, but ineffective in the zero-shot setting. Since the paucity of even plain digital text affects the majority of the world’s languages, we hope that these insights will broaden the scope of applications for language technology

    On the relation between linguistic typology and (limitations of) multilingual language modeling

    Get PDF
    A key challenge in cross-lingual NLP is developing general language-independent architectures that are equally applicable to any language. However, this ambition is largely hampered by the variation in structural and semantic properties, i.e. the typological profiles of the world's languages. In this work, we analyse the implications of this variation on the language modeling (LM) task. We present a large-scale study of state-of-the art n-gram based and neural language models on 50 typologically diverse languages covering a wide variety of morphological systems. Operating in the full vocabulary LM setup focused on word-level prediction, we demonstrate that a coarse typology of morphological systems is predictive of absolute LM performance. Moreover, fine-grained typological features such as exponence, flexivity, fusion, and inflectional synthesis are borne out to be responsible for the proliferation of low-frequency phenomena which are organically difficult to model by statistical architectures, or for the meaning ambiguity of character n-grams. Our study strongly suggests that these features have to be taken into consideration during the construction of next-level language-agnostic LM architectures, capable of handling morphologically complex languages such as Tamil or Korean.ERC grant Lexica

    Adversarial propagation and zero-shot cross-lingual transfer of word vector specialization

    Get PDF
    Semantic \specialization is a process of fine-tuning pre-trained distributional word vectors using external lexical knowledge (e.g., WordNet) to accentuate a particular semantic relation in the specialized vector space. While post-processing specialization methods are applicable to arbitrary distributional vectors, they are limited to updating only the vectors of words occurring in external lexicons (i.e., seen words), leaving the vectors of all other words unchanged. We propose a novel approach to specializing the full distributional vocabulary. Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space. We exploit words seen in the resources as training examples for learning a global specialization function. This function is learned by combining a standard L2-distance loss with a adversarial loss: the adversarial component produces more realistic output vectors. We show the effectiveness and robustness of the proposed method across three languages and on three tasks: word similarity, dialog state tracking, and lexical simplification. We report consistent improvements over distributional word vectors and vectors specialized by other state-of-the-art specialization frameworks. Finally, we also propose a cross-lingual transfer method for zero-shot specialization which successfully specializes a full target distributional space without any lexical knowledge in the target language and without any bilingual data

    Cross-lingual semantic specialization via lexical relation induction

    Get PDF
    Semantic specialization integrates structured linguistic knowledge from external resources (such as lexical relations in WordNet) into pretrained distributional vectors in the form of constraints. However, this technique cannot be leveraged in many languages, because their structured external resources are typically incomplete or non-existent. To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language. Our specialization transfer comprises two crucial steps: 1) Inducing noisy constraints in the target language through automatic word translation; and 2) Filtering the noisy constraints via a state-of-the-art relation prediction model trained on the source language constraints. This allows us to specialize any set of distributional vectors in the target language with the refined constraints. We prove the effectiveness of our method through intrinsic word similarity evaluation in 8 languages, and with 3 downstream tasks in 5 languages: lexical simplification, dialog state tracking, and semantic textual similarity. The gains over the previous state-of-art specialization methods are substantial and consistent across languages. Our results also suggest that the transfer method is effective even for lexically distant source-target language pairs. Finally, as a by-product, our method produces lists of WordNet-style lexical relations in resource-poor languages

    Decoding sentiment from distributed representations of sentences

    Get PDF
    Distributed representations of sentences have been developed recently to represent their meaning as real-valued vectors. However, it is not clear how much information such representations retain about the polarity of sentences. To study this question, we decode sentiment from unsupervised sentence representations learned with different architectures (sensitive to the order of words, the order of sentences, or none) in 9 typologically diverse languages. Sentiment results from the (recursive) composition of lexical items and grammatical strategies such as negation and concession. The results are manifold: we show that there is no `one-size-fits-all' representation architecture outperforming the others across the board. Rather, the top-ranking architectures depend on the language and data at hand. Moreover, we find that in several cases the additive composition model based on skip-gram word vectors may surpass supervised state-of-art architectures such as bidirectional LSTMs. Finally, we provide a possible explanation of the observed variation based on the type of negative constructions in each language

    Composition of the pericellular matrix modulates the deformation behaviour of chondrocytes in articular cartilage under static loading

    Get PDF
    The aim was to assess the role of the composition changes in the pericellular matrix (PCM) for the chondrocyte deformation. For that, a three-dimensional finite element model with depth-dependent collagen density, fluid fraction, fixed charge density and collagen architecture, including parallel planes representing the split-lines, was created to model the extracellular matrix (ECM). The PCM was constructed similarly as the ECM, but the collagen fibrils were oriented parallel to the chondrocyte surfaces. The chondrocytes were modelled as poroelastic with swelling properties. Deformation behaviour of the cells was studied under 15% static compression. Due to the depth-dependent structure and composition of cartilage, axial cell strains were highly depth-dependent. An increase in the collagen content and fluid fraction in the PCMs increased the lateral cell strains, while an increase in the fixed charge density induced an inverse behaviour. Axial cell strains were only slightly affected by the changes in PCM composition. We conclude that the PCM composition plays a significant role in the deformation behaviour of chondrocytes, possibly modulating cartilage development, adaptation and degeneration. The development of cartilage repair materials could benefit from this information

    Zircon ages in granulite facies rocks: decoupling from geochemistry above 850 °C?

    Get PDF
    Granulite facies rocks frequently show a large spread in their zircon ages, the interpretation of which raises questions: Has the isotopic system been disturbed? By what process(es) and conditions did the alteration occur? Can the dates be regarded as real ages, reflecting several growth episodes? Furthermore, under some circumstances of (ultra-)high-temperature metamorphism, decoupling of zircon U–Pb dates from their trace element geochemistry has been reported. Understanding these processes is crucial to help interpret such dates in the context of the P–T history. Our study presents evidence for decoupling in zircon from the highest grade metapelites (> 850 °C) taken along a continuous high-temperature metamorphic field gradient in the Ivrea Zone (NW Italy). These rocks represent a well-characterised segment of Permian lower continental crust with a protracted high-temperature history. Cathodoluminescence images reveal that zircons in the mid-amphibolite facies preserve mainly detrital cores with narrow overgrowths. In the upper amphibolite and granulite facies, preserved detrital cores decrease and metamorphic zircon increases in quantity. Across all samples we document a sequence of four rim generations based on textures. U–Pb dates, Th/U ratios and Ti-in-zircon concentrations show an essentially continuous evolution with increasing metamorphic grade, except in the samples from the granulite facies, which display significant scatter in age and chemistry. We associate the observed decoupling of zircon systematics in high-grade non-metamict zircon with disturbance processes related to differences in behaviour of non-formula elements (i.e. Pb, Th, U, Ti) at high-temperature conditions, notably differences in compatibility within the crystal structure

    Functional Roles of the N- and C-Terminal Regions of the Human Mitochondrial Single-Stranded DNA-Binding Protein

    Get PDF
    Biochemical studies of the mitochondrial DNA (mtDNA) replisome demonstrate that the mtDNA polymerase and the mtDNA helicase are stimulated by the mitochondrial single-stranded DNA-binding protein (mtSSB). Unlike Escherichia coli SSB, bacteriophage T7 gp2.5 and bacteriophage T4 gp32, mtSSBs lack a long, negatively charged C-terminal tail. Furthermore, additional residues at the N-terminus (notwithstanding the mitochondrial presequence) are present in the sequence of species across the animal kingdom. We sought to analyze the functional importance of the N- and C-terminal regions of the human mtSSB in the context of mtDNA replication. We produced the mature wild-type human mtSSB and three terminal deletion variants, and examined their physical and biochemical properties. We demonstrate that the recombinant proteins adopt a tetrameric form, and bind single-stranded DNA with similar affinities. They also stimulate similarly the DNA unwinding activity of the human mtDNA helicase (up to 8-fold). Notably, we find that unlike the high level of stimulation that we observed previously in the Drosophila system, stimulation of DNA synthesis catalyzed by human mtDNA polymerase is only moderate, and occurs over a narrow range of salt concentrations. Interestingly, each of the deletion variants of human mtSSB stimulates DNA synthesis at a higher level than the wild-type protein, indicating that the termini modulate negatively functional interactions with the mitochondrial replicase. We discuss our findings in the context of species-specific components of the mtDNA replisome, and in comparison with various prokaryotic DNA replication machineries
    corecore