Search CORE

159 research outputs found

Evolving Knowledge Distillation with Large Language Models and Active Learning

Author: Jiang Zhuoren
Kang Yangyang
Kuang Kun
Liu Chengyuan
Sun Changlong
Wu Fei
Zhao Fubang
Publication venue
Publication date: 10/03/2024
Field of study

Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks. However, their computational costs are prohibitively high. To address this issue, previous research has attempted to distill the knowledge of LLMs into smaller models by generating annotated data. Nonetheless, these works have mainly focused on the direct use of LLMs for text generation and labeling, without fully exploring their potential to comprehend the target task and acquire valuable knowledge. In this paper, we propose EvoKD: Evolving Knowledge Distillation, which leverages the concept of active learning to interactively enhance the process of data generation using large language models, simultaneously improving the task capabilities of small domain model (student model). Different from previous work, we actively analyze the student model's weaknesses, and then synthesize labeled samples based on the analysis. In addition, we provide iterative feedback to the LLMs regarding the student model's performance to continuously construct diversified and challenging samples. Experiments and analysis on different NLP tasks, namely, text classification and named entity recognition show the effectiveness of EvoKD.Comment: Accepted by COLING 202

arXiv.org e-Print Archive

Comprehensive Information Integration Modeling Framework for Video Titling

Author: Jiang Tan
Kuang Kun
Tan Ziqi
Wu Fei
Yang Hongxia
Yu Jin
Zhang Shengyu
Zhao Zhou
Zhou Jingren
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/06/2020
Field of study

In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical. However, consumer-generated videos seldom accompany appropriate titles. To bridge this gap, we integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework. Although automatic video titling is very useful and demanding, it is much less addressed than video captioning. The latter focuses on generating sentences that describe videos as a whole while our task requires the product-aware multi-grained video analysis. To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization. Specifically, the granular-level interaction modeling first utilizes temporal-spatial landmark cues, descriptive words, and abstractive attributes to builds three individual graphs and recognizes the intra-actions in each graph through Graph Neural Networks (GNN). Then the global-local aggregation module is proposed to model inter-actions across graphs and aggregate heterogeneous graphs into a holistic graph representation. The abstraction-level story-line summarization further considers both frame-level video features and the holistic graph to utilize the interactions between products and backgrounds, and generate the story-line topic of the video. We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform, and will make the desensitized version publicly available to nourish further development of the research community...Comment: 11 pages, 6 figures, to appear in KDD 2020 proceeding

arXiv.org e-Print Archive

Crossref

Single-cell immune profiling reveals immune responses in oral lichen planus

Author: Dunfang Zhang
En Luo
Fei Wang
Hongxia Dan
Liang Zhong
Lu Jiang
Na Liu
Qianming Chen
Qionghua Li
Shumin Duan
Taiwen Li
Taiwen Li
Wenjing Kuang
Xiaobo Luo
Xin Zeng
Yu Zhou
Yujie Shi
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2023
Field of study

IntroductionOral lichen planus (OLP) is a common chronic inflammatory disorder of the oral mucosa with an unclear etiology. Several types of immune cells are involved in the pathogenesis of OLP.MethodsWe used single-cell RNA sequencing and immune repertoire sequencing to characterize the mucosal immune microenvironment of OLP. The presence of tissue-resident memory CD8+ T cells are validated by multiplex immunofluorescence.ResultsWe generated a transcriptome atlas from four OLP biopsy samples and their paired peripheral blood mononuclear cells (PBMCs), and compared them with two healthy tissues and three healthy PBMCs samples. Our analysis revealed activated tissue-resident memory CD8+ T cells in OLP tissues. T cell receptor repertoires displayed apperant clonal expansion and preferrential gene pairing in OLP patients. Additionally, obvious BCR clonal expansion was observed in OLP lesions. Plasmacytoid dendritic cells, a subtype that can promote dendritic cell maturation and enhance lymphocyte cytotoxicity, were identified in OLP. Conventional dendritic cells and macrophages are also found to exhibit pro-inflammatory activity in OLP. Cell-cell communication analysis reveals that fibroblasts might promote the recruitment and extravasation of immune cells into connective tissue.DiscussionOur study provides insights into the immune ecosystem of OLP, serving as a valuable resource for precision diagnosis and therapy of OLP

Directory of Open Access Journals

Lithium, an anti-psychotic drug, greatly enhances the generation of induced pluripotent stem cells

Author: A You
AJ Harwood
AK Silva
Cong Jiang
D Huangfu
D Kim
D Wu
Duanqing Pei
F Gonzalez
FR Garcia-Gonzalo
G McColl
G Pan
Haifeng Gu
I Chambers
J Chen
J Chen
J Silva
J Yu
JB Kim
Jian Fei
Jiekai Chen
Jing Liu
Jun Li
K Okita
L Pasquali
LA Boyer
MA Esteban
MA Hakimi
N Gurvich
N Sato
Ping Wang
Quan Wang
RS Jope
Ru Zhang
RW Chen
S Zhu
Sheng Ding
T Kawamura
W Li
W Young
Xin Xie
Xinxiu Xu
XY Zhao
Y Li
Y Shi
Y Shi
Y Takao
Yin Kuang
Publication venue: Nature Publishing Group
Publication date: 01/10/2011
Field of study

Somatic cells can be reprogrammed into induced pluripotent stem cells (iPSCs) by defined factors. The low efficiency of reprogramming and genomic integration of oncogenes and viral vectors limited the potential application of iPSCs. Here we report that Lithium (Li), a drug used to treat mood disorders, greatly enhances iPSC generation from both mouse embryonic fibroblast and human umbilical vein endothelial cells. Li facilitates iPSC generation with one (Oct4) or two factors (OS or OK). The effect of Li on promoting reprogramming only partially depends on its major target GSK3β. Unlike other GSK3β inhibitors, Li not only increases the expression of Nanog, but also enhances the transcriptional activity of Nanog. We also found that Li exerts its effect by promoting epigenetic modifications via downregulation of LSD1, a H3K4-specific histone demethylase. Knocking down LSD1 partially mimics Li's effect in enhancing reprogramming. Our results not only provide a straightforward method to improve the iPSC generation efficiency, but also identified a histone demethylase as a critical modulator for somatic cell reprogramming

Crossref

PubMed Central

eScholarship - University of California

Histone deacetylase HD2 interacts with ERF1 and is involved in longan fruit senescence

Author: Aharoni
Alonso
Aravind
Berger
Boutilier
Bowyer
Büttner
Cao
Chakravarthy
Chen
Chen
Chung
Dangl
Demetriou
Depège-Fargeix
Duan
Earley
El-Sharkawy
Elliott
Fujimoto
Gilmour
Guo
Hao
Hollender
Hu
Jang
Jian-fei Kuang
Jian-ye Chen
Jiang
Jofuku
Ke-qiang Wu
Kim
Lagaće M Chantha SC, Major G, Matton DP
Lee
Leshem
Leshem
Li
Lin
Liu
Lusser
Ming Luo
Nakano
Nath
Ohem-Takagi
Okamuro
Pandey
Peng
Rashotte
Sakuma
Sharma
Silverstein
Song
Song
Sozzi
Sridha
Stockinger
Strahl
Tian
Tian
Tournier
Turner
Ueno
Vlachonasios
Walter
Wan
Wang
Wang-jin Lu
Wei Sun
Wills
Wu
Wu
Wu
Wu
Xiao
Xu
Yaish
Yang
Yin
Yoo
Yu
Yue-ming Jiang
Zhang
Zhong
Zhou
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

Histone deacetylation plays an important role in epigenetic control of gene expression. HD2 is a plant-specific histone deacetylase that is able to mediate transcriptional repression in many biological processes. To investigate the epigenetic and transcriptional mechanisms of longan fruit senescence, one histone deacetylase 2-like gene, DlHD2, and two ethylene-responsive factor-like genes, DlERF1 and DlERF2, were cloned and characterized from longan fruit. Expression of these genes was examined during fruit senescence under different storage conditions. The accumulation of DlHD2 reached a peak at 2 d and 30 d in the fruit stored at 25 °C (room temperature) and 4 °C (low temperature), respectively, or 6 h after the fruit was transferred from 4 °C to 25 °C, when fruit senescence was initiated. However, the DlERF1 transcript accumulated mostly at the later stage of fruit senescence, reaching a peak at 5 d and 35 d in the fruit stored at 25 °C and 4 °C, respectively, or 36 h after the fruit was transferred from low temperature to room temperature. Moreover, application of nitric oxide (NO) delayed fruit senescence, enhanced the expression of DlHD2, but suppressed the expression of DlERF1 and DlERF2. These results indicated a possible interaction between DlHD2 and DlERFs in regulating longan fruit senescence, and the direct interaction between DlHD2 and DlERF1 was confirmed by yeast two-hybrid and bimolecular fluorescence complementation (BiFC) assays. Taken together, the results suggested that DlHD2 may act with DlERF1 to regulate gene expression involved in longan fruit senescence

Crossref

PubMed Central

Gene ontology based transfer learning for protein subcellular localization

Author: A Bateman
A Dijk
A Hoglund
A Hoglund
A Pierleoni
C Chen
C Leslie
C Leslie
DH Haft
E Marcotte
EM Zdobnov
F Corpet
FM Li
G Lanckriet
G Schneider
H Ding
H Lin
H Lin
H Liu
H Rangwala
H Shen
HB Shen
HB Shen
HB Shen
HB Shen
HB Shen
J Cedano
J Schultz
J Shen
JD Qiu
JD Qiu
K Chou
K Chou
K Chou
K Hofmann
K Lee
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
L Nanni
M Ashburner
M Esmaeili
M Mak
M Wang
Q Gu
Q Yang
R Apweiler
R Kuang
R Kuang
S Mei
S Pan
Shuigeng Zhou
Suyu Mei
T Blum
T Tung
TK Attwood
W Dai
W Dai
W Huang
W Huang
Wang Fei
X Jiang
X Xiao
XB Zhou
YH Zeng
YS Ding
YS Ding
Z Lei
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as <it>GO</it>, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the <it>GO </it>terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology. Results In this paper, we propose a Gene Ontology Based Transfer Learning Model (<it>GO-TLM</it>) for large-scale protein subcellular localization. The model transfers the signature-based homologous <it>GO </it>terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false <it>GO </it>terms that are resulted from evolutionary divergence. We derive three <it>GO </it>kernels from the three aspects of gene ontology to measure the <it>GO </it>similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for protein subcellular localization. We evaluate <it>GO-TLM </it>performance against three baseline models: <it>MultiLoc, MultiLoc-GO </it>and <it>Euk-mPLoc </it>on the benchmark datasets the baseline models adopted. 5-fold cross validation experiments show that <it>GO-TLM </it>achieves substantial accuracy improvement against the baseline models: 80.38% against model <it>Euk-mPLoc </it>67.40% with <it>12.98% </it>substantial increase; 96.65% and 96.27% against model <it>MultiLoc-GO </it>89.60% and 89.60%, with <it>7.05% </it>and <it>6.67% </it>accuracy increase on dataset <it>MultiLoc plant </it>and dataset <it>MultiLoc animal</it>, respectively; 97.14%, 95.90% and 96.85% against model <it>MultiLoc-GO </it>83.70%, 90.10% and 85.70%, with accuracy increase <it>13.44%</it>, <it>5.8% </it>and <it>11.15% </it>on dataset <it>BaCelLoc plant</it>, dataset <it>BaCelLoc fungi </it>and dataset <it>BaCelLoc animal </it>respectively. For <it>BaCelLoc </it>independent sets, <it>GO-TLM </it>achieves 81.25%, 80.45% and 79.46% on dataset <it>BaCelLoc plant holdout</it>, dataset <it>BaCelLoc plant holdout </it>and dataset <it>BaCelLoc animal holdout</it>, respectively, as compared against baseline model <it>MultiLoc-GO </it>76%, 60.00% and 73.00%, with accuracy increase <it>5.25%</it>, <it>20.45% </it>and <it>6.46%</it>, respectively. Conclusions Since direct homology-based <it>GO </it>term transfer may be prone to introducing noise and outliers to the target protein, we design an explicitly weighted kernel learning system (called Gene Ontology Based Transfer Learning Model, <it>GO-TLM</it>) to transfer to the target protein the known knowledge about related homologous proteins, which can reduce the risk of outliers and share knowledge between homologous proteins, and thus achieve better predictive performance for protein subcellular localization. Cross validation and independent test experimental results show that the homology-based <it>GO </it>term transfer and explicitly weighing the <it>GO </it>kernels substantially improve the prediction performance.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Molecular Characterization of a Strawberry FaASR Gene in Relation to Fruit Ripening

Author: A Aharoni
A Aharoni
A Fait
A Saumonneau
A Schneider
B Çakir
C Canel
C Chervin
C Davies
CA Bustamante
CS Wang
CY Wan
CY Yang
D Shkolnik
D Silhavy
DD Archbold
DD Archbold
DJ Liu
Du-juan Liu
F Csukasi
F Riccardi
FJ Zhang
FQ Fu
G Symons
GA Gambetta
GA Martinez
GA Shen
H Abe
HJ Wang
HY Liu
JC Huang
Ji Hoon Ahn
Jian-fei Kuang
Jian-ye Chen
JJ Giovannoni
JJ Giovannoni
JJ Zhang
JT Chernys
K Kojima
K Manning
K Manning
KondoS
L Alexander
L Maskin
L Mukkun
L Tian
L Trainotti
M Giribaldi
M Griesser
M Houde
M Jeanneau
M Mahdieh
M Zhang
M Zhang
MH Harpster
Ming-lei Zhao
MP González-García
N Frankel
N Frankel
N Urtasun
ND Iusem
NK Given
P Perkins-Veazie
P Pimentel
PM Civello
PPM Iannetta
R Vaidyanathan
RR Finkelstein
S Abel
S Kondo
S Lacampagne
S Lee
S Wheeler
SH Hong
SH Zhu
SR Saravanan
ST Jeong
T Ban
V Padmanabhan
V Tisza
Wang-jin Lu
Wei Shan
X Li
Y Kalifa
Y Kalifa
Y Kano
Y Yoneda
YM Jiang
Yue-ming Jiang
YW Nam
Z Fei
Z Konrad
ZF Lin
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

BACKGROUND: ABA-, stress- and ripening-induced (ASR) proteins have been reported to act as a downstream component involved in ABA signal transduction. Although much attention has been paid to the roles of ASR in plant development and stress responses, the mechanisms by which ABA regulate fruit ripening at the molecular level are not fully understood. In the present work, a strawberry ASR gene was isolated and characterized (FaASR), and a polyclonal antibody against FaASR protein was prepared. Furthermore, the effects of ABA, applied to two different developmental stages of strawberry, on fruit ripening and the expression of FaASR at transcriptional and translational levels were investigated. METHODOLOGY/PRINCIPAL FINDINGS: FaASR, localized in the cytoplasm and nucleus, contained 193 amino acids and shared common features with other plant ASRs. It also functioned as a transcriptional activator in yeast with trans-activation activity in the N-terminus. During strawberry fruit development, endogenous ABA content, levels of FaASR mRNA and protein increased significantly at the initiation of ripening at a white (W) fruit developmental stage. More importantly, application of exogenous ABA to large green (LG) fruit and W fruit markedly increased endogenous ABA content, accelerated fruit ripening, and greatly enhanced the expression of FaASR transcripts and the accumulation of FaASR protein simultaneously. CONCLUSIONS: These results indicate that FaASR may be involved in strawberry fruit ripening. The observed increase in endogenous ABA content, and enhanced FaASR expression at transcriptional and translational levels in response to ABA treatment might partially contribute to the acceleration of strawberry fruit ripening

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

South China Botanical Garden, Chinese Academy of Sciences

Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays.

Author: Chen Ao
Chen Bichao
Chen Xi
Cheng Mengnan
Cornall Richard J
Dai Xi
Esteban Miguel A
Fei Ji-Feng
Feng Weimin
Gu Ying
Guo Pengcheng
Han Lei
Hao Shijie
Hong Yan
Huang Xin
Hui Junhou
Jiang Yujia
Kuang Haoyan
Lai Guangyao
Lai Yiwei
Li Mei
Li Quanshui
Li Yuxiang
Li Zhao
Liao Sha
Lisby Michael
Liu Chuanyu
Liu Longqi
Liu Shiping
Liu Shuai
Liu Xing
Liu Yang
Lu Haorong
Lu Huifang
Ma Kailong
Maxwell Patrick H
Mu Feng
Mulder Jan
Muñoz-Cánoves Pura
Ni Ming
Peng Jian
Qin Baoming
Qiu Xiaojie
Shen Mengzhe
Thiery Jean Paul
Uhlén Mathias
Wang Bo
Wang Jian
Wang Liqun
Wang Ou
Wang Shuai
Wang Xin
Wang Zhaohui
Wang Zhifeng
Wei Xiaofeng
Wei Xiaoyu
Wu Liang
Wu Qing-Feng
Xiang Haitao
Xu Jiangshan
Xu Xun
Yang Huanming
Yang Jin
Yang Yunzhi
Yin Ye
Yuan Yue
Zhang Wenwei
Zhao Fuxiang
Zheng Huiwen
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Spatially resolved transcriptomic technologies are promising tools to study complex biological processes such as mammalian embryogenesis. However, the imbalance between resolution, gene capture, and field of view of current methodologies precludes their systematic application to analyze relatively large and three-dimensional mid- and late-gestation embryos. Here, we combined DNA nanoball (DNB)-patterned arrays and in situ RNA capture to create spatial enhanced resolution omics-sequencing (Stereo-seq). We applied Stereo-seq to generate the mouse organogenesis spatiotemporal transcriptomic atlas (MOSTA), which maps with single-cell resolution and high sensitivity the kinetics and directionality of transcriptional variation during mouse organogenesis. We used this information to gain insight into the molecular basis of spatial cell heterogeneity and cell fate specification in developing tissues such as the dorsal midbrain. Our panoramic atlas will facilitate in-depth investigation of longstanding questions concerning normal and abnormal mammalian development.This work is part of the ‘‘SpatioTemporal Omics Consortium’’ (STOC) paper package. A list of STOC members is available at: http://sto-consortium.org. We would like to thank the MOTIC China Group, Rongqin Ke (Huaqiao University, Xiamen, China), Jiazuan Ni (Shenzhen University, Shenzhen, China), Wei Huang (Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China), and Jonathan S. Weissman (Whitehead Institute, Boston, USA) for their help. This work was supported by the grant of Top Ten Foundamental Research Institutes of Shenzhen, the Shenzhen Key Laboratory of Single-Cell Omics (ZDSYS20190902093613831), and the Guangdong Provincial Key Laboratory of Genome Read and Write (2017B030301011); Longqi Liu was supported by the National Natural Science Foundation of China (31900466) and Miguel A. Esteban’s laboratory at the Guangzhou Institutes of Biomedicine and Health by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA16030502), National Natural Science Foundation of China (92068106), and the Guangdong Basic and Applied Basic Research Foundation (2021B1515120075).S

Copenhagen University Research Information System

REPISALUD

The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions

Author: A. Bombarely
A. Levi
B. Ham
B. Huang
B. Wang
D. Liang
E. Pang
F. Murat
G. Gong
H. Hongju
H. Kuang
H. Qun
H. Schoof
H. Sun
H. Xuesong
H. Zhang
H. Zhao
J. Jiang
J. Liu
J. Min
J. Salse
J. Wang
J. Wang
J. Wang
J. Zhang
J.J. Giovannoni
K. Klee
K. Lin
L. Mao
L. Ruiqiang
L. Tian
L. Yunfu
L.A. Mueller
M. Huang
M. Xing
N. Peixiang
Q. Kou
S. Dong
S. Gao
S. Guo
S. Huang
S. Zhong
T. Tan
W. Hou
W. Kui
W. Mingzhu
W.J. Lucas
X. Guo
X. Liang
X. Yimin
X. Yong
X. Zhang
X. Zhao
X. Zou
Y. Hongping
Y. Huang
Y. Ren
Y. Xia
Y. Yin
Y. Zhang
Y. Zheng
Z. Fei
Z. Wang
Z. Zhang
Z. Zhang
Z. Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Watermelon, Citrullus lanatus, is an important cucurbit crop grown throughout the world. Here we report a high-quality draft genome sequence of the east Asia watermelon cultivar 97103 (2n = 2 7 = 22) containing 23,440 predicted protein-coding genes. Comparative genomics analysis provided an evolutionary scenario for the origin of the 11 watermelon chromosomes derived from a 7-chromosome paleohexaploid eudicot ancestor. Resequencing of 20 watermelon accessions representing three different C. lanatus subspecies produced numerous haplotypes and identified the extent of genetic diversity and population structure of watermelon germplasm. Genomic regions that were preferentially selected during domestication were identified. Many disease-resistance genes were also found to be lost during domestication. In addition, integrative genomic and transcriptomic analyses yielded important insights into aspects of phloem-based vascular signaling in common between watermelon and cucumber and identified genes crucial to valuable fruit-quality traits, including sugar accumulation and citrulline metabolism

AIR Universita degli studi di Milano