Search CORE

90 research outputs found

Named Entity Recognition via Machine Reading Comprehension: A Multi-Task Learning Approach

Author: Deng Zhongfen
Wan Yao
Wang Yibo
Yu Philip S.
Zhao Wenting
Publication venue
Publication date: 19/09/2023
Field of study

Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types (e.g., organization or person name). Recently, many works have been proposed to shape the NER as a machine reading comprehension problem (also termed MRC-based NER), in which entity recognition is achieved by answering the formulated questions related to pre-defined entity types through MRC, based on the contexts. However, these works ignore the label dependencies among entity types, which are critical for precisely recognizing named entities. In this paper, we propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER. We decompose MRC-based NER into multiple tasks and use a self-attention module to capture label dependencies. Comprehensive experiments on both nested NER and flat NER datasets are conducted to validate the effectiveness of the proposed Multi-NER. Experimental results show that Multi-NER can achieve better performance on all datasets

arXiv.org e-Print Archive

Stock Volatility Prediction Based on Transformer Model Using Mixed-Frequency Data

Author: Gui Zhaozhong
Jiang Guilin
Leng Wan
Liu Wenting
Liu Yujiang
Tang Lihua
Zhang Xulong
Zhou Lichun
Publication venue
Publication date: 28/09/2023
Field of study

With the increasing volume of high-frequency data in the information age, both challenges and opportunities arise in the prediction of stock volatility. On one hand, the outcome of prediction using tradition method combining stock technical and macroeconomic indicators still leaves room for improvement; on the other hand, macroeconomic indicators and peoples' search record on those search engines affecting their interested topics will intuitively have an impact on the stock volatility. For the convenience of assessment of the influence of these indicators, macroeconomic indicators and stock technical indicators are then grouped into objective factors, while Baidu search indices implying people's interested topics are defined as subjective factors. To align different frequency data, we introduce GARCH-MIDAS model. After mixing all the above data, we then feed them into Transformer model as part of the training data. Our experiments show that this model outperforms the baselines in terms of mean square error. The adaption of both types of data under Transformer model significantly reduces the mean square error from 1.00 to 0.86.Comment: Accepted by the 7th APWeb-WAIM International Joint Conference on Web and Big Data. (APWeb 2023

arXiv.org e-Print Archive

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Author: Joty Shafiq
Liu Ye
Niu Tong
Wan Yao
Yavuz Semih
Yu Philip S.
Zhao Wenting
Zhou Yingbo
Publication venue
Publication date: 31/10/2023
Field of study

Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasized retrieval from unstructured text corpora, owing to its seamless integration into prompts. When using structured data such as knowledge graphs, most methods simplify it into natural text, neglecting the underlying structures. Moreover, a significant gap in the current landscape is the absence of a realistic benchmark for evaluating the effectiveness of grounding LLMs on heterogeneous knowledge sources (e.g., knowledge base and text). To fill this gap, we have curated a comprehensive dataset that poses two unique challenges: (1) Two-hop multi-source questions that require retrieving information from both open-domain structured and unstructured knowledge sources; retrieving information from structured knowledge sources is a critical component in correctly answering the questions. (2) The generation of symbolic queries (e.g., SPARQL for Wikidata) is a key requirement, which adds another layer of challenge. Our dataset is created using a combination of automatic generation through predefined reasoning chains and human annotation. We also introduce a novel approach that leverages multiple retrieval tools, including text passage retrieval and symbolic language-assisted retrieval. Our model outperforms previous approaches by a significant margin, demonstrating its effectiveness in addressing the above-mentioned reasoning challenges

arXiv.org e-Print Archive

Effective TME-related signature to predict prognosis of patients with head and neck squamous cell carcinoma

Author: Chao Yang
Chen Li
Lingfei Wan
Lingfei Wan
Ruihong Li
Wen Yue
Wen Yue
Wenting Pan
Wenting Pan
Xingxing Zhao
Xingxing Zhao
Xinlong Yan
Yuanshuai Li
Yuting Yong
Yuting Yong
Publication venue: Frontiers Media S.A.
Publication date: 01/08/2023
Field of study

Introduction: The tumor microenvironment (TME) is crucial for the development of head and neck squamous cell carcinoma (HNSCC). However, the correlation of the characteristics of the TME and the prognosis of patients with HNSCC remains less known.Methods: In this study, we calculated the immune and stromal cell scores using the “estimate” R package. Kaplan-Meier survival and CIBERSORT algorithm analyses were applied in this study.Results: We identified seven new markers: FCGR3B, IGHV3-64, AC023449.2, IGKV1D-8, FCGR2A, WDFY4, and HBQ1. Subsequently, a risk model was constructed and all HNSCC samples were grouped into low- and high-risk groups. The results of both the Kaplan-Meier survival and receiver operating characteristic curve (ROC) analyses showed that the prognosis indicated by the model was accurate (0.758, 0.756, and 0.666 for 1-, 3- and 5-year survival rates). In addition, we applied the CIBERSORT algorithm to reveal the significant differences in the infiltration levels of immune cells between the two risk groups.Discussion: Our study elucidated the roles of the TME and identified new prognostic biomarkers for patients with HNSCC

Directory of Open Access Journals

The interaction between the soluble programmed death ligand-1 (sPD-L1) and PD-1+ regulator B cells mediates immunosuppression in triple-negative breast cancer

Author: Fang Xie
Huan Du
Huan Du
Jing Lan
Longxiang PuYang
Qin Wang
Qiuxia Qu
Shenghua Zhan
Sining Wang
Wenting Liu
Wenting Liu
Xuejiao Li
Xuejiao Li
Yang Yang
Yuqiu Wan
Yuqiu Wan
Zhangyu Wang
Zhangyu Wang
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2022
Field of study

Accumulating evidence suggests that regulatory B cells (Bregs) play important roles in inhibiting the immune response in tumors. Programmed death 1 (PD-1) and programmed death ligand 1 (PD-L1) are important molecules that maintain the balance of the immune response and immune tolerance. This study aims to evaluate the soluble form of PD-L1 and its function in inducing the differentiation of B lymphocytes, investigate the relationship between soluble PD-L1 (sPD-L1) and B-cell subsets, and explore the antitumor activity of T lymphocytes after PD-L1 blockade in coculture systems. In an effort to explore the role of sPD-L1 in human breast cancer etiology, we examined the levels of sPD-L1 and interleukin-10 (IL-10) in the serum of breast tumor patients and the proportions of B cells, PD-1+ B cells, Bregs, and PD-1+ Bregs in the peripheral blood of patients with breast tumors and assessed their relationship among sPD-L1, IL-10, and B-cell subsets. The levels of sPD-L1 and IL-10 in serum were found to be significantly higher in invasive breast cancer (IBCa) patients than in breast fibroadenoma (FIBma) patients. Meanwhile, the proportions and absolute numbers of Bregs and PD-1+ Bregs in the peripheral blood of IBCa patients were significantly higher than those of FIBma patients. Notably, they were the highest in triple-negative breast cancer (TNBC) among other subtypes of IBCa. Positive correlations of sPD-L1 and IL-10, IL-10 and PD-1+ Bregs, and also sPD-L1 and PD-1+ Bregs were observed in IBCa. We further demonstrated that sPD-L1 could induce Breg differentiation, IL-10 secretion, and IL-10 mRNA expression in a dose-dependent manner in vitro. Finally, the induction of regulatory T cells (Tregs) by Bregs was further shown to suppress the antitumor response and that PD-L1 blockade therapies could promote the apoptosis of tumor cells. Together, these results indicated that sPD-L1 could mediate the differentiation of Bregs, expand CD4+ Tregs and weaken the antitumor activity of CD4+ T cells. PD-L1/PD-1 blockade therapies might be a powerful therapeutic strategy for IBCa patients, particularly for TNBC patients with high level of PD-1+ Bregs

Directory of Open Access Journals

Microbial contamination status of student meal in Wenzhou from 2016 to 2020

Author: CAI Yuanyuan
GAO Sihai
HONG Chengji
LI Yi
LIN Dan
SHAN Yujuan
WAN Wenting
WANG Lili
Publication venue: The Editorial Office of Chinese Journal of Food Hygiene
Publication date: 01/02/2023
Field of study

ObjectiveTo provide basis for further ensuring the safety of student meals， the microbial contamination status in kindergarten， primary and secondary school canteens in Wenzhou in the past five years were investigated.MethodsFood samples collected in Wenzhou from 2016 to 2020 were detected for hygienic target bacteria （Aerobic Plate count and Escherichia coli） and foodborne pathogens （Staphylococcus aureus， Bacillus cereus， Salmonella and Listeria monocytogenes）， and data were analyzed with SPSS 18.0.ResultsThe microbial contamination of student meal in 2016 was serious， with a unqualified rate as high as 23.58%. While the unqualified rates of student meal decreased by 6.27%， 6.80%， 9.06%， and 3.82% from 2017 to 2020 （χ2 = 60.852， P<0.001）. Escherichia coli contamination was one of the most serious， and its unqualified rates in the past five years showed a downward trend （9.43%， 5.64%， 6.47%， 6.41%， and 1.91%， χ2 = 5.225， P = 0.022）. Except for the higher detection rates of Staphylococcus aureus and Salmonella in 2016 （7.35% and 9.91%）， the unqualified rates of foodborne pathogens in other years were at a low level. For different types of schools， the unqualified rates of meal samples for kindergarten， primary and secondary school students in Wenzhou from 2016 to 2020 were 10.34%， 12.81%， and 6.90%， respectively， which had significant differences （χ2 = 8.341， P = 0.015）. For different sampling quarters and monitoring points， no significant difference was observed in the overall status of microbial contamination of student meal. Compared with 2016， the risk of microbial contamination of student meal significantly reduced from 2017 to 2020 after adjusting the influencing factors such as the school type， sampling season and location （P<0.01）.ConclusionMicrobial contamination of the student meal in kindergartens， primary and middle schools in Wenzhou was the most serious in 2016， while the hygiene conditions of student meal improved from 2017 to 2020. Foodborne microbial contamination in Wenzhou could potentially threaten student health， which should be monitored to prevent the occurrence of foodborne illness in schools

Directory of Open Access Journals

Low alpha-defensin gene copy number increases the risk for IgA nephropathy and renal dysfunction

Author: Ai Zhen
Armour John A.L.
Barratt Jonathan
Chen Jian
Dong Xiuqing
Fan Jinjin
Feng Shaozhen
Foo Jia-Nee
Gale Daniel
Li Ming
Liu Jianjun
Liu Wenting
Lou Tanqi
Mansouri Omniah
Mao Haiping
Tang Xueqing
Wan Jianxin
Xu Ricong
Yin Peiran
Yu Jianwen
Yu Xueqing
Zhong Zhong
Zhou Qian
Zhou Qin
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 29/06/2016
Field of study

IgA nephropathy (IgAN) is the most common primary glomerulonephritis worldwide. Although a major source of genetic variation, copy number variations (CNVs) and their involvement in disease development have not been well studied. Here, we performed association analysis of the DEFA1A3 CNV locus in two independent IgAN cohorts of Southern Chinese Han (total1189 cases and 1187 controls). We discovered three independent copy number associations within the locus: DEFA1A3 (P=3.99×10-9, OR=0.88), DEFA3 (P=6.55×10-5, OR=0.82) and a noncoding deletion variant (211bp) (P=3.50×10-16, OR=0.75) (OR per copy, fixed-effects meta-analysis). While showing strong association with increased risk for IgAN (P=9.56×10-20), low total copy numbers of the three variants also showed significant association with renal dysfunction in patients with IgAN (P=0.03, HR=3.69, after controlling for the effects of known prognostic factors) as well as high serum IgA1 (P=0.02) and a high proportion of galactose-deficient IgA1 (P=0.03). For replication, we confirmed the associations of DEFA1A3 (P=4.42×10-4, OR=0.82) and DEFA3 copy numbers (P=4.30×10-3, OR=0.74) with IgAN in a Caucasian cohort (531 cases and 198 controls) and found the 211bp variant to be much rarer in Caucasians. Interestingly, we also observed an association of the 211bp copy number with membranous nephropathy (P=1.11×10-7, OR=0.74 in 493 Chinese cases and 500 matched controls), but not with diabetic kidney disease (in 806 Chinese cases and 786 matched controls). By explaining 4.96% of disease risk and influencing the renal dysfunction in IgAN, the DEFA1A3 CNV locus is a potential candidate for therapeutic target and prognostic marker development

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

UCL Discovery

Leicester Research Archive

Derivedness index for estimating degree of phenotypic evolution of embryos: a study of comparative transcriptomic analyses of chordates and echinoderms

Author: Bradham Cynthia
Chen Luonan
Dong Yang
Hao Meng
Irie Naoki
Leong Jason Cheok Kuan
Li Yongxin
Livingston Brian T.
Omori Akihito
Ren Yandong
Uchida Yui
Uesaka Masahiro
Wan Wenting
Wang Fayou
Wang Wen
Wessel Gary
Zeng Tao
Zhang Si
Publication venue: Frontiers Media SA
Publication date: 26/11/2021
Field of study

Species retaining ancestral features, such as species called living fossils, are often regarded as less derived than their sister groups, but such discussions are usually based on qualitative enumeration of conserved traits. This approach creates a major barrier, especially when quantifying the degree of phenotypic evolution or degree of derivedness, since it focuses only on commonly shared traits, and newly acquired or lost traits are often overlooked. To provide a potential solution to this problem, especially for inter-species comparison of gene expression profiles, we propose a new method named "derivedness index" to quantify the degree of derivedness. In contrast to the conservation-based approach, which deals with expressions of commonly shared genes among species being compared, the derivedness index also considers those that were potentially lost or duplicated during evolution. By applying our method, we found that the gene expression profiles of penta-radial phases in echinoderm tended to be more highly derived than those of the bilateral phase. However, our results suggest that echinoderms may not have experienced much larger modifications to their developmental systems than chordates, at least at the transcriptomic level. In vertebrates, we found that the mid-embryonic and organogenesis stages were generally less derived than the earlier or later stages, indicating that the conserved phylotypic period is also less derived. We also found genes that potentially explain less derivedness, such as Hox genes. Finally, we highlight technical concerns that may influence the measured transcriptomic derivedness, such as read depth and library preparation protocols, for further improvement of our method through future studies. We anticipate that this index will serve as a quantitative guide in the search for constrained developmental phases or processes.Published versio

Boston University Institutional Repository (OpenBU)

Genomic heterogeneity of multiple synchronous lung cancer

Author: Behrens Carmen
Cai Wenjun
Chen Huang
Chen Longyun
Chen Wenting
Cheng Shujun
Cheung Hannah
Chow Chi-Wan
Correa Arlene
Fujimoto Junya
Futreal P. Andrew
Gao Yanning
Han Naijun
Heymach John V.
Hong Waun Ki
Lee J. Jack
Li Lin
Li Lin
Lin Dongmei
Liu Kan
Liu Xiangyang
Liu Yu
Lu Ning
Mao Xizeng
Seth Sahil
Shen Miaozhong
Song Xingzhi
Swisher Stephen
Wang Jun
William William N.
Wistuba Ignacio I.
Wu Ning
Xie Yongqiang
Xu Ningzhi
Yang Huanming
Yang Longhai
Yin Guangliang
Zhang Jianhua
Zhang Jianjun
Zhang Jiexin
Zhang Li
Zhang Susu
Zhao Chuanduo
Zheng Shan
Zhou Lina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Multiple synchronous lung cancers (MSLCs) present a clinical dilemma as to whether individual tumours represent intrapulmonary metastases or independent tumours. In this study we analyse genomic profiles of 15 lung adenocarcinomas and one regional lymph node metastasis from 6 patients with MSLC. All 15 lung tumours demonstrate distinct genomic profiles, suggesting all are independent primary tumours, which are consistent with comprehensive histopathological assessment in 5 of the 6 patients. Lung tumours of the same individuals are no more similar to each other than are lung adenocarcinomas of different patients from TCGA cohort matched for tumour size and smoking status. Several known cancer-associated genes have different mutations in different tumours from the same patients. These findings suggest that in the context of identical constitutional genetic background and environmental exposure, different lung cancers in the same individual may have distinct genomic profiles and can be driven by distinct molecular events

Copenhagen University Research Information System

PubMed Central