Search CORE

237 research outputs found

Enabling Large Language Models to Learn from Rules

Author: Lin Yankai
Wen Jirong
Yang Wenkai
Zhou Jie
Publication venue
Publication date: 15/11/2023
Field of study

Large language models (LLMs) have shown incredible performance in completing various real-world tasks. The current knowledge learning paradigm of LLMs is mainly based on learning from examples, in which LLMs learn the internal rule implicitly from a certain number of supervised examples. However, the learning paradigm may not well learn those complicated rules, especially when the training examples are limited. We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules. That is, humans can grasp the new tasks or knowledge quickly and generalize well given only a detailed rule and a few optional examples. Therefore, in this paper, we aim to explore the feasibility of this new learning paradigm, which encodes the rule-based knowledge into LLMs. We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules and then explicitly encode the knowledge into LLMs' parameters by learning from the above in-context signals produced inside the model. Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.Comment: In progres

arXiv.org e-Print Archive

Using Dempster-Shafer’s evidence theory for query expansion based on freebase knowledge

Author: De Roeck Anne
Hou Yuexian
Hu Bin
Jia Yuan
Li Jingfei
Pan Dazhao
Song Dawei
Wen Jirong
Zhang Peng
Publication venue
Publication date: 01/01/2013
Field of study

Query expansion is generally a useful technique in improving search performance. However, some expanded query terms obtained by traditional statistical methods (e.g., pseudo-relevance feedback) may not be relevant to the user's information need, while some relevant terms may not be contained in the feedback documents at all. Recent studies utilize external resources to detect terms that are related to the query, and then adopt these terms in query expansion. In this paper, we present a study in the use of Freebase, which is an open source general-purpose ontology, as a source for deriving expansion terms. FreeBase provides a graph-based model of human knowledge, from which a rich and multi-step structure of instances related to the query concept can be extracted, as a complement to the traditional statistical approaches to query expansion. We propose a novel method, based on the well-principled Dempster-Shafer's (D-S) evidence theory, to measure the certainty of expansion terms from the Freebase structure. The expanded query model is then combined with a state of the art statistical query expansion model - the Relevance Model (RM3). Experiments show that the proposed method achieves significant improvements over RM3

Crossref

Open Research Online (The Open University)

Recommended from our members

Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk

Author: Ahn Yoon-Ok
Cai Qiuyin
Casey Graham
Chan Andrew T
Chang-Claude Jenny
Cho Sang-Hee
Gao Yu-Tang
Gruber Stephen B.
Guo Yan
Hosono Satoyo
Jee Sun Ha
Jeong Jin-Young
Ji Bu-Tian
Jia Wei-Hua
Kim Dong-Hyun
Kim Hyeong-Rok
Kim Soriul
Kubo Michiaki
Kweon Sun-Seog
Li Bingshan
Li Chun
Li Hong-Lan
Long Jirong
Matsuda Fumihiko
Matsuda Koichi
Matsuo Keitaro
Oh Jae Hwan
Pan Zhi-Zhong
Park Ji Won
Ren Zefang
Schumacher Fredrick R.
Shi Jiajun
Shin Aesun
Shin Min-Ho
Shu Xiao-Ou
Slattery Martha L.
Stenzel Stephanie L.
Takahashi Atsushi
Wen Wanqing
Xiang Yong-Bing
Yang Gong
Zeng Yi-Xin
Zhang Ben
Zhang Yanfeng
Zheng Wei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/01/2015
Field of study

Known genetic loci explain only a small proportion of the familial relative risk of colorectal cancer (CRC). We conducted the largest genome-wide association study in East Asians with 14,963 CRC cases and 31,945 controls and identified six new loci associated with CRC risk (P = 3.42 × 10−8 to 9.22 × 10−21) at 10q22.3, 10q25.2, 11q12.2, 12p13.31, 17p13.3 and 19q13.2. Two of these loci map to genes (TCF7L2 and TGFB1) with established roles in colorectal tumorigenesis. Four other loci are located in or near genes involved in transcription regulation (ZMIZ1), genome maintenance (FEN1), fatty acid metabolism (FADS1 and FADS2), cancer cell motility and metastasis (CD9) and cell growth and differentiation (NXN). We also found suggestive evidence for three additional loci associated with CRC risk near genome-wide significance at 8q24.11, 10q21.1 and 10q24.2. Furthermore, we replicated 22 previously reported CRC loci. Our study provides insights into the genetic basis of CRC and suggests new biological pathways

Harvard University - DASH

Identification of New Genetic Risk Variants for Type 2 Diabetes

Author: A Ray
Bok-Ghee Han
Chun Li
Daniel P. K. Ng
DE Moller
E Zeggini
E Zeggini
E. Shyong Tai
F Jaggi
FJ Tsai
Frank B. Hu
GL King
GR Guy
H Unoki
Huaixing Li
Hyung-Lae Kim
I Yanai
J Dupuis
J Rung
JC Chan
Jiajun Shi
Jirong Long
Jong-Young Lee
K Warton
Kai Yu
L Qi
Liegang Liu
LJ Scott
Lu Qi
MA Cabrita
Marilyn C. Cornelis
Mark Seielstad
Min Jin Go
OC Leeksma
P
Peter M. Visscher
PN Surampudi
Qibin Qi
Qiuyin Cai
R Sladek
RB Goldberg
S Purcell
S Verploegen
TM Frayling
W Ding
W Zheng
Wanqing Wen
Wei Bao
Wei Zheng
Wong-Ho Chow
Xiangyang Li
Xiao Ou Shu
Xu Lin
Xue Ling Sim
Yong-Bing Xiang
Yoon Shin Cho
Young Jin Kim
Yu-Tang Gao
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Although more than 20 genetic susceptibility loci have been reported for type 2 diabetes (T2D), most reported variants have small to moderate effects and account for only a small proportion of the heritability of T2D, suggesting that the majority of inter-person genetic variation in this disease remains to be determined. We conducted a multistage, genome-wide association study (GWAS) within the Asian Consortium of Diabetes to search for T2D susceptibility markers. From 590,887 SNPs genotyped in 1,019 T2D cases and 1,710 controls selected from Chinese women in Shanghai, we selected the top 2,100 SNPs that were not in linkage disequilibrium (r2<0.2) with known T2D loci for in silico replication in three T2D GWAS conducted among European Americans, Koreans, and Singapore Chinese. The 5 most promising SNPs were genotyped in an independent set of 1,645 cases and 1,649 controls from Shanghai, and 4 of them were further genotyped in 1,487 cases and 3,316 controls from 2 additional Chinese studies. Consistent associations across all studies were found for rs1359790 (13q31.1), rs10906115 (10p13), and rs1436955 (15q22.2) with P-values (per allele OR, 95%CI) of 6.49×10−9 (1.15, 1.10–1.20), 1.45×10−8 (1.13, 1.08–1.18), and 7.14×10−7 (1.13, 1.08–1.19), respectively, in combined analyses of 9,794 cases and 14,615 controls. Our study provides strong evidence for a novel T2D susceptibility locus at 13q31.1 and the presence of new independent risk variants near regions (10p13 and 15q22.2) reported by previous GWAS

Public Library of Science (PLOS)

CiteSeerX

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Identification of a Functional Genetic Variant at 16q12.1 for Breast Cancer Risk: Results from the Asia Breast Cancer Consortium

Genetic factors play an important role in the etiology of breast cancer. We carried out a multi-stage genome-wide association (GWA) study in over 28,000 cases and controls recruited from 12 studies conducted in Asian and European American women to identify genetic susceptibility loci for breast cancer. After analyzing 684,457 SNPs in 2,073 cases and 2,084 controls in Chinese women, we evaluated 53 SNPs for fast-track replication in an independent set of 4,425 cases and 1,915 controls of Chinese origin. Four replicated SNPs were further investigated in an independent set of 6,173 cases and 6,340 controls from seven other studies conducted in Asian women. SNP rs4784227 was consistently associated with breast cancer risk across all studies with adjusted odds ratios (95% confidence intervals) of 1.25 (1.20−1.31) per allele (P = 3.2×10−25) in the pooled analysis of samples from all Asian samples. This SNP was also associated with breast cancer risk among European Americans (per allele OR = 1.19, 95% CI = 1.09−1.31, P = 1.3×10−4, 2,797 cases and 2,662 controls). SNP rs4784227 is located at 16q12.1, a region identified previously for breast cancer risk among Europeans. The association of this SNP with breast cancer risk remained highly statistically significant in Asians after adjusting for previously-reported SNPs in this region. In vitro experiments using both luciferase reporter and electrophoretic mobility shift assays demonstrated functional significance of this SNP. These results provide strong evidence implicating rs4784227 as a functional causal variant for breast cancer in the locus 16q12.1 and demonstrate the utility of conducting genetic association studies in populations with different genetic architectures

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub