96 research outputs found
Towards zero-shot language modeling
Can we construct a neural language model which is inductively biased towards learning human language? Motivated by this question, we aim at constructing an informative prior for held-out languages on the task of character-level, open-vocabulary language modeling. We obtain this prior as the posterior over network weights conditioned on the data from a sample of training languages, which is approximated through Laplace’s method. Based on a large and diverse sample of languages, the use of our prior outperforms baseline models with an uninformative prior in both zero-shot and few-shot settings, showing that the prior is imbued with universal linguistic knowledge. Moreover, we harness broad language-specific information available for most languages of the world, i.e., features from typological databases, as distant supervision for held-out languages. We explore several language modeling conditioning techniques, including concatenation and meta-networks for parameter generation. They appear beneficial in the few-shot setting, but ineffective in the zero-shot setting. Since the paucity of even plain digital text affects the majority of the world’s languages, we hope that these insights will broaden the scope of applications for language technology
Specializing distributional vectors of allwords for lexical entailment
Semantic specialization methods fine-tune distributional word vectors using lexical knowledge from external resources (e.g., WordNet) to accentuate a particular relation between words. However, such post-processing methods suffer from limited coverage as they affect only vectors of words seen in the external resources. We present the first postprocessing method that specializes vectors of all vocabulary words – including those unseen in the resources – for the asymmetric relation of lexical entailment (LE) (i.e., hyponymyhypernymy relation). Leveraging a partially LE-specialized distributional space, our POSTLE (i.e., post-specialization for LE) model learns an explicit global specialization function, allowing for specialization of vectors of unseen words, as well as word vectors from other languages via cross-lingual transfer. We capture the function as a deep feedforward neural network: its objective re-scales vector norms to reflect the concept hierarchy while simultaneously attracting hyponymyhypernymy pairs to better reflect semantic similarity. An extended model variant augments the basic architecture with an adversarial discriminator. We demonstrate the usefulness and versatility of POSTLE models with different input distributional spaces in different scenarios (monolingual LE and zero-shot cross-lingual LE transfer) and tasks (binary and graded LE). We report consistent gains over state-of-the-art LE-specialization methods, and successfully LE-specialize word vectors for languages without any external lexical knowledge
On the relation between linguistic typology and (limitations of) multilingual language modeling
A key challenge in cross-lingual NLP is developing general language-independent architectures that are equally applicable to any language. However, this ambition is largely hampered by the variation in structural and semantic properties, i.e. the typological profiles of the world's languages. In this work, we analyse the implications of this variation on the language modeling (LM) task. We present a large-scale study of state-of-the art n-gram based and neural language models on 50 typologically diverse languages covering a wide variety of morphological systems. Operating in the full vocabulary LM setup focused on word-level prediction, we demonstrate that a coarse typology of morphological systems is predictive of absolute LM performance. Moreover, fine-grained typological features such as exponence, flexivity, fusion, and inflectional synthesis are borne out to be responsible for the proliferation of low-frequency phenomena which are organically difficult to model by statistical architectures, or for the meaning ambiguity of character n-grams. Our study strongly suggests that these features have to be taken into consideration during the construction of next-level language-agnostic LM architectures, capable of handling morphologically complex languages such as Tamil or Korean.ERC grant Lexica
Adversarial propagation and zero-shot cross-lingual transfer of word vector specialization
Semantic \specialization is a process of fine-tuning pre-trained distributional word vectors using external lexical knowledge (e.g., WordNet) to accentuate a particular semantic relation in the specialized vector space. While post-processing specialization methods are applicable to arbitrary distributional vectors, they are limited to updating only the vectors of words occurring in external lexicons (i.e., seen words), leaving the vectors of all other words unchanged. We propose a novel approach to specializing the full distributional vocabulary. Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space. We exploit words seen in the resources as training examples for learning a global specialization function. This function is learned by combining a standard L2-distance loss with a adversarial loss: the adversarial component produces more realistic output vectors. We show the effectiveness and robustness of the proposed method across three languages and on three tasks: word similarity, dialog state tracking, and lexical simplification. We report consistent improvements over distributional word vectors and vectors specialized by other state-of-the-art specialization frameworks. Finally, we also propose a cross-lingual transfer method for zero-shot specialization which successfully specializes a full target distributional space without any lexical knowledge in the target language and without any bilingual data
Cross-lingual semantic specialization via lexical relation induction
Semantic specialization integrates structured linguistic knowledge from external resources (such as lexical relations in WordNet) into pretrained distributional vectors in the form of constraints. However, this technique cannot be leveraged in many languages, because their structured external resources are typically incomplete or non-existent. To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language. Our specialization transfer comprises two crucial steps: 1) Inducing noisy constraints in the target language through automatic word translation; and 2) Filtering the noisy constraints via a state-of-the-art relation prediction model trained on the source language constraints. This allows us to specialize any set of distributional vectors in the target language with the refined constraints. We prove the effectiveness of our method through intrinsic word similarity evaluation in 8 languages, and with 3 downstream tasks in 5 languages: lexical simplification, dialog state tracking, and semantic textual similarity. The gains over the previous state-of-art specialization methods are substantial and consistent across languages. Our results also suggest that the transfer method is effective even for lexically distant source-target language pairs. Finally, as a by-product, our method produces lists of WordNet-style lexical relations in resource-poor languages
Decoding sentiment from distributed representations of sentences
Distributed representations of sentences have been developed recently to represent their meaning as real-valued vectors. However, it is not clear how much information such representations retain about the polarity of sentences. To study this question, we decode sentiment from unsupervised sentence representations learned with different architectures (sensitive to the order of words, the order of sentences, or none) in 9 typologically diverse languages. Sentiment results from the (recursive) composition of lexical items and grammatical strategies such as negation and concession. The results are manifold: we show that there is no `one-size-fits-all' representation architecture outperforming the others across the board. Rather, the top-ranking architectures depend on the language and data at hand. Moreover, we find that in several cases the additive composition model based on skip-gram word vectors may surpass supervised state-of-art architectures such as bidirectional LSTMs. Finally, we provide a possible explanation of the observed variation based on the type of negative constructions in each language
Study of the excess Fe XXV line emission in the central degrees of the Galactic centre using XMM-Newton data
The diffuse Fe XXV (6.7 keV) line emission observed in the Galactic ridge is widely accepted to be produced by a superposition of a large number of unresolved X-ray point sources. In the very central degrees of our Galaxy, however, the existence of an extremely hot (~7 keV) diffuse plasma is still under debate. In this work we measure the Fe XXV line emission using all available XMM-Newton observations of the Galactic centre (GC) and inner disc (-10 < l < 10, -2 < b < 2). We use recent stellar mass distribution models to estimate the amount of X-ray emission originating from unresolved point sources, and find that within a region of l = ±1 and b = ±0.25 the 6.7keV emission is 1.3-1.5 times in excess of what is expected from unresolved point sources. The excess emission is enhanced towards regions where known supernova remnants are located, suggesting that at least a part of this emission is due to genuine diffuse very hot plasma. If the entire excess is due to very hot plasma, an energy injection rate of at least ~6 × 1040 erg s-1 is required, which cannot be provided by the measured supernova explosion rate or past Sgr A∗ activity alone. However, we find that almost the entire excess we observe can be explained by assuming GC stellar populations with iron abundances ~1.9 times higher than those in the bar/bulge, a value that can be reproduced by fitting diffuse X-ray spectra from the corresponding regions. Even in this case, a leftover X-ray excess is concentrated within l = ±0.3 and b = ±0.15, corresponding to a thermal energy of ~2 × 1052 erg, which can be reproduced by the estimated supernova explosion rate in the GC. Finally we discuss a possible connection to the observed GC Fermi-LAT excess
Hypomelanosis of Ito with a trisomy 2 mosaicism: a case report
Introduction: Hypomelanosis of Ito is a rare neurocutaneous disorder, characterized by streaks and swirls of hypopigmentation following the lines of Blaschko that may be associated to systemic abnormalities involving the central nervous system and musculoskeletal system. Despite the preponderance of reported sporadic hypomelanosis of Ito, few reports of familial hypomelanosis of Ito have been described. Case presentation: A 6-month-old Caucasian girl presented with unilateral areas of hypomelanosis distributed on the left half of her body and her father presented with similar mosaic hypopigmented lesions on his upper chest. Whereas both blood karyotypes obtained from peripheral lymphocyte cultures were normal, a 16% trisomy 2 mosaicism was found in cultured skinfibroblasts derived from a hypopigmented skin area of her father. Conclusions: Familial cases of hypomelanosis of Ito are very rare and can occur in patients without systemic involvement. Hypomelanosis of Ito constitutes a non-specific diagnostic definition including different clinical entities with a wide phenotypic variability, either sporadic or familial. Unfortunately, a large number of cases remain misdiagnosed due to both diagnostic challenges and controversial issues on cutaneous biopsies in the pediatric population
Prostate Cancer Cell Lines under Hypoxia Exhibit Greater Stem-Like Properties
Hypoxia is an important environmental change in many cancers. Hypoxic niches can be occupied by cancer stem/progenitor-like cells that are associated with tumor progression and resistance to radiotherapy and chemotherapy. However, it has not yet been fully elucidated how hypoxia influences the stem-like properties of prostate cancer cells. In this report, we investigated the effects of hypoxia on human prostate cancer cell lines, PC-3 and DU145. In comparison to normoxia (20% O2), 7% O2 induced higher expressions of HIF-1α and HIF-2α, which were associated with upregulation of Oct3/4 and Nanog; 1% O2 induced even greater levels of these factors. The upregulated NANOG mRNA expression in hypoxia was confirmed to be predominantly retrogene NANOGP8. Similar growth rates were observed for cells cultivated under hypoxic and normoxic conditions for 48 hours; however, the colony formation assay revealed that 48 hours of hypoxic pretreatment resulted in the formation of more colonies. Treatment with 1% O2 also extended the G0/G1 stage, resulting in more side population cells, and induced CD44 and ABCG2 expressions. Hypoxia also increased the number of cells positive for ABCG2 expression, which were predominantly found to be CD44bright cells. Correspondingly, the sorted CD44bright cells expressed higher levels of ABCG2, Oct3/4, and Nanog than CD44dim cells, and hypoxic pretreatment significantly increased the expressions of these factors. CD44bright cells under normoxia formed significantly more colonies and spheres compared with the CD44dim cells, and hypoxic pretreatment even increased this effect. Our data indicate that prostate cancer cells under hypoxia possess greater stem-like properties
- …