370 research outputs found

    Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding

    Full text link
    Contrastive learning has become a new paradigm for unsupervised sentence embeddings. Previous studies focus on instance-wise contrastive learning, attempting to construct positive pairs with textual data augmentation. In this paper, we propose a novel Contrastive learning method with Prompt-derived Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts, we construct virtual semantic prototypes to each instance, and derive negative prototypes by using the negative form of the prompts. Using a prototypical contrastive loss, we enforce the anchor sentence embedding to be close to its corresponding semantic prototypes, and far apart from the negative prototypes as well as the prototypes of other sentences. Extensive experimental results on semantic textual similarity, transfer, and clustering tasks demonstrate the effectiveness of our proposed model compared to strong baselines. Code is available at https://github.com/lemon0830/promptCSE.Comment: Findings of EMNLP 202

    Worldwide cohort study of 46, XY differences/disorders of sex development genetic diagnoses: geographic and ethnic differences in variants

    Get PDF
    Differences/disorders of sex development (DSDs) in individuals with a 46, XY karyotype are a group of congenital disorders that manifest as male gonadal hypoplasia or abnormalities of the external genitalia. Approximately 50% of patients with 46, XY DSDs cannot obtain a molecular diagnosis. The aims of this paper were to review the most common causative genes and rare genes in patients with 46, XY DSDs, analyze global molecular diagnostic cohorts for the prevalence and geographic distribution of causative genes, and identify the factors affecting cohort detection results. Although the spectrum of genetic variants varies across regions and the severity of the clinical phenotype varies across patients, next-generation sequencing (NGS), the most commonly used detection method, can still reveal genetic variants and aid in diagnosis. A comparison of the detection rates of various sequencing modalities revealed that whole-exome sequencing (WES) facilitates a greater rate of molecular diagnosis of the disease than panel sequencing. Whole-genome sequencing (WGS), third-generation sequencing, and algorithm advancements will contribute to the improvement of detection efficiency. The most commonly mutated genes associated with androgen synthesis and action are AR, SR5A2, and HSD17B3, and the most commonly mutated genes involved in gonadal formation are NR5A1 and MAP3K1. Detection results are affected by differences in enrollment criteria and sequencing technologies

    Project Overview of the Beijing-Arizona Sky Survey

    Full text link
    The Beijing-Arizona Sky Survey (BASS) is a wide-field two-band photometric survey of the Northern Galactic Cap using the 90Prime imager on the 2.3 m Bok telescope at Kitt Peak. It is a four-year collaboration between the National Astronomical Observatory of China and Steward Observatory, the University of Arizona, serving as one of the three imaging surveys to provide photometric input catalogs for target selection of the Dark Energy Spectroscopic Instrument (DESI) project. BASS will take up to 240 dark/grey nights to cover an area of about 5400 deg2^2 in the gg and rr bands. The 5σ\sigma limiting AB magnitudes for point sources in the two bands, corrected for the Galactic extinction, are 24.0 and 23.4 mag, respectively. BASS, together with other DESI imaging surveys, will provide unique science opportunities that cover a wide range of topics in both Galactic and extragalactic astronomy.Comment: 10 pages, submitted to PAS

    Discrimination and classification of tobacco wastes by identification and quantification of polyphenols with LC–MS/MS

    Get PDF
    The chemical composition of polyphenols in tobacco waste was identified by HPLC-PDA–ESI/MS/MS and the contents of chlorogenic acids and rutin in 10 varieties of tobacco wastes were determined by HPLC–UV. The relationships between the contents of active polyphenols and the varieties of tobacco wastes were interpreted by hierarchical cluster analysis (HCA) and principal component analysis (PCA). The results showed that 15 polyphenols were identified in a methanolic extract of dried tobacco waste. The tobacco wastes were characterized by high levels of chlorogenic acids (3-CQA, 5-CQA, and 4-CQA) and rutin; their ranges in the 10 tobacco varieties were 0.116–0.196, 0.686–1.781, 0.094–0.192, and 0.413–0.998 %, respectively. According to multivariate statistics models, two active compound variables can be considered important for the discrimination of the varieties of tobacco wastes: chlorogenic acids and rutin. Consequently, samples of 10 tobacco varieties were characterized into three groups by HCA based on the PCA pattern. In conclusion, tobacco waste could be used as a new pharmaceutical material for the production of natural chlorogenic acids and rutin in the ethnopharmacological industry

    Soft Language Clustering for Multilingual Model Pre-training

    Full text link
    Multilingual pre-trained language models have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typology from source languages or when pre-training data is limited in size. In this paper, we propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods. On the tasks of XTREME including text classification, sequence labeling, question answering, and sentence retrieval, both base- and large-size language models pre-trained with our proposed method exhibit consistent performance improvement. Furthermore, it provides substantial advantages for low-resource languages in unsupervised sentence retrieval and for target languages that differ greatly from the source language in cross-lingual transfer
    • …
    corecore