Search CORE

370 research outputs found

Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding

Author: Cao Yunbo
Jiang Yufan
Wu Shuangzhi
Yin Yongjing
Zeng Jiali
Publication venue
Publication date: 07/11/2022
Field of study

Contrastive learning has become a new paradigm for unsupervised sentence embeddings. Previous studies focus on instance-wise contrastive learning, attempting to construct positive pairs with textual data augmentation. In this paper, we propose a novel Contrastive learning method with Prompt-derived Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts, we construct virtual semantic prototypes to each instance, and derive negative prototypes by using the negative form of the prompts. Using a prototypical contrastive loss, we enforce the anchor sentence embedding to be close to its corresponding semantic prototypes, and far apart from the negative prototypes as well as the prototypes of other sentences. Extensive experimental results on semantic textual similarity, transfer, and clustering tasks demonstrate the effectiveness of our proposed model compared to strong baselines. Code is available at https://github.com/lemon0830/promptCSE.Comment: Findings of EMNLP 202

arXiv.org e-Print Archive

Worldwide cohort study of 46, XY differences/disorders of sex development genetic diagnoses: geographic and ethnic differences in variants

Author: Chen Jiali
Jiang Hongwei
Jiang Yuqing
Peng Huifang
Zeng Xiantao
Publication venue: Frontiers Media S.A.
Publication date: 01/06/2024
Field of study

Differences/disorders of sex development (DSDs) in individuals with a 46, XY karyotype are a group of congenital disorders that manifest as male gonadal hypoplasia or abnormalities of the external genitalia. Approximately 50% of patients with 46, XY DSDs cannot obtain a molecular diagnosis. The aims of this paper were to review the most common causative genes and rare genes in patients with 46, XY DSDs, analyze global molecular diagnostic cohorts for the prevalence and geographic distribution of causative genes, and identify the factors affecting cohort detection results. Although the spectrum of genetic variants varies across regions and the severity of the clinical phenotype varies across patients, next-generation sequencing (NGS), the most commonly used detection method, can still reveal genetic variants and aid in diagnosis. A comparison of the detection rates of various sequencing modalities revealed that whole-exome sequencing (WES) facilitates a greater rate of molecular diagnosis of the disease than panel sequencing. Whole-genome sequencing (WGS), third-generation sequencing, and algorithm advancements will contribute to the improvement of detection efficiency. The most commonly mutated genes associated with androgen synthesis and action are AR, SR5A2, and HSD17B3, and the most commonly mutated genes involved in gonadal formation are NR5A1 and MAP3K1. Detection results are affected by differences in enrollment criteria and sequencing technologies

Directory of Open Access Journals

Project Overview of the Beijing-Arizona Sky Survey

Author: Dey Arjun
Fan Dongwei
Fan Xiaohui
He Boliang
Jiang Linhua
Jiang Zhaoji
Lang Dustin
Lesser Michael
Ma Jun
Mao Shude
McGreer Ian
Nie Jundan
Peng Xiyan
Schlegel David
Wang Jiali
Zhang Tianmeng
Zhou Xu
Zhou Zhimin
Zou Hu
Publication venue: 'IOP Publishing'
Publication date: 14/02/2017
Field of study

The Beijing-Arizona Sky Survey (BASS) is a wide-field two-band photometric survey of the Northern Galactic Cap using the 90Prime imager on the 2.3 m Bok telescope at Kitt Peak. It is a four-year collaboration between the National Astronomical Observatory of China and Steward Observatory, the University of Arizona, serving as one of the three imaging surveys to provide photometric input catalogs for target selection of the Dark Energy Spectroscopic Instrument (DESI) project. BASS will take up to 240 dark/grey nights to cover an area of about 5400 deg

^2

in the

g

and

r

bands. The 5

\sigma

limiting AB magnitudes for point sources in the two bands, corrected for the Galactic extinction, are 24.0 and 23.4 mag, respectively. BASS, together with other DESI imaging surveys, will provide unique science opportunities that cover a wide range of topics in both Galactic and extragalactic astronomy.Comment: 10 pages, submitted to PAS

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

Discrimination and classification of tobacco wastes by identification and quantification of polyphenols with LC–MS/MS

Author: BEN JIANG
DINGQIANG LU
HONG CHAI
HUI ZHAO
JIALI WANG
JUN WANG
PINGKAI OUYANG
XIUQUAN LING
Publication venue: Serbian Chemical Society
Publication date: 01/01/2010
Field of study

The chemical composition of polyphenols in tobacco waste was identified by HPLC-PDA–ESI/MS/MS and the contents of chlorogenic acids and rutin in 10 varieties of tobacco wastes were determined by HPLC–UV. The relationships between the contents of active polyphenols and the varieties of tobacco wastes were interpreted by hierarchical cluster analysis (HCA) and principal component analysis (PCA). The results showed that 15 polyphenols were identified in a methanolic extract of dried tobacco waste. The tobacco wastes were characterized by high levels of chlorogenic acids (3-CQA, 5-CQA, and 4-CQA) and rutin; their ranges in the 10 tobacco varieties were 0.116–0.196, 0.686–1.781, 0.094–0.192, and 0.413–0.998 %, respectively. According to multivariate statistics models, two active compound variables can be considered important for the discrimination of the varieties of tobacco wastes: chlorogenic acids and rutin. Consequently, samples of 10 tobacco varieties were characterized into three groups by HCA based on the PCA pattern. In conclusion, tobacco waste could be used as a new pharmaceutical material for the production of natural chlorogenic acids and rutin in the ethnopharmacological industry

Crossref

Directory of Open Access Journals

Soft Language Clustering for Multilingual Model Pre-training

Author: Cao Yunbo
Jiang Yufan
Jing Yi
Lin Binghuai
Meng Fandong
Yin Yongjing
Zeng Jiali
Zhou Jie
Publication venue
Publication date: 13/06/2023
Field of study

Multilingual pre-trained language models have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typology from source languages or when pre-training data is limited in size. In this paper, we propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods. On the tasks of XTREME including text classification, sequence labeling, question answering, and sentence retrieval, both base- and large-size language models pre-trained with our proposed method exhibit consistent performance improvement. Furthermore, it provides substantial advantages for low-resource languages in unsupervised sentence retrieval and for target languages that differ greatly from the source language in cross-lingual transfer

arXiv.org e-Print Archive