237 research outputs found
Enabling Large Language Models to Learn from Rules
Large language models (LLMs) have shown incredible performance in completing
various real-world tasks. The current knowledge learning paradigm of LLMs is
mainly based on learning from examples, in which LLMs learn the internal rule
implicitly from a certain number of supervised examples. However, the learning
paradigm may not well learn those complicated rules, especially when the
training examples are limited. We are inspired that humans can learn the new
tasks or knowledge in another way by learning from rules. That is, humans can
grasp the new tasks or knowledge quickly and generalize well given only a
detailed rule and a few optional examples. Therefore, in this paper, we aim to
explore the feasibility of this new learning paradigm, which encodes the
rule-based knowledge into LLMs. We propose rule distillation, which first uses
the strong in-context abilities of LLMs to extract the knowledge from the
textual rules and then explicitly encode the knowledge into LLMs' parameters by
learning from the above in-context signals produced inside the model. Our
experiments show that making LLMs learn from rules by our method is much more
efficient than example-based learning in both the sample size and
generalization ability.Comment: In progres
Using Dempster-Shafer’s evidence theory for query expansion based on freebase knowledge
Query expansion is generally a useful technique in improving search performance. However, some expanded query terms obtained by traditional statistical methods (e.g., pseudo-relevance feedback) may not be relevant to the user's information need, while some relevant terms may not be contained in the feedback documents at all. Recent studies utilize external resources to detect terms that are related to the query, and then adopt these terms in query expansion. In this paper, we present a study in the use of Freebase, which is an open source general-purpose ontology, as a source for deriving expansion terms. FreeBase provides a graph-based model of human knowledge, from which a rich and multi-step structure of instances related to the query concept can be extracted, as a complement to the traditional statistical approaches to query expansion. We propose a novel method, based on the well-principled Dempster-Shafer's (D-S) evidence theory, to measure the certainty of expansion terms from the Freebase structure. The expanded query model is then combined with a state of the art statistical query expansion model - the Relevance Model (RM3). Experiments show that the proposed method achieves significant improvements over RM3
Recommended from our members
Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk
Known genetic loci explain only a small proportion of the familial relative risk of colorectal cancer (CRC). We conducted the largest genome-wide association study in East Asians with 14,963 CRC cases and 31,945 controls and identified six new loci associated with CRC risk (P = 3.42 × 10−8 to 9.22 × 10−21) at 10q22.3, 10q25.2, 11q12.2, 12p13.31, 17p13.3 and 19q13.2. Two of these loci map to genes (TCF7L2 and TGFB1) with established roles in colorectal tumorigenesis. Four other loci are located in or near genes involved in transcription regulation (ZMIZ1), genome maintenance (FEN1), fatty acid metabolism (FADS1 and FADS2), cancer cell motility and metastasis (CD9) and cell growth and differentiation (NXN). We also found suggestive evidence for three additional loci associated with CRC risk near genome-wide significance at 8q24.11, 10q21.1 and 10q24.2. Furthermore, we replicated 22 previously reported CRC loci. Our study provides insights into the genetic basis of CRC and suggests new biological pathways
Identification of New Genetic Risk Variants for Type 2 Diabetes
Although more than 20 genetic susceptibility loci have been reported for type 2 diabetes (T2D), most reported variants have small to moderate effects and account for only a small proportion of the heritability of T2D, suggesting that the majority of inter-person genetic variation in this disease remains to be determined. We conducted a multistage, genome-wide association study (GWAS) within the Asian Consortium of Diabetes to search for T2D susceptibility markers. From 590,887 SNPs genotyped in 1,019 T2D cases and 1,710 controls selected from Chinese women in Shanghai, we selected the top 2,100 SNPs that were not in linkage disequilibrium (r2<0.2) with known T2D loci for in silico replication in three T2D GWAS conducted among European Americans, Koreans, and Singapore Chinese. The 5 most promising SNPs were genotyped in an independent set of 1,645 cases and 1,649 controls from Shanghai, and 4 of them were further genotyped in 1,487 cases and 3,316 controls from 2 additional Chinese studies. Consistent associations across all studies were found for rs1359790 (13q31.1), rs10906115 (10p13), and rs1436955 (15q22.2) with P-values (per allele OR, 95%CI) of 6.49×10−9 (1.15, 1.10–1.20), 1.45×10−8 (1.13, 1.08–1.18), and 7.14×10−7 (1.13, 1.08–1.19), respectively, in combined analyses of 9,794 cases and 14,615 controls. Our study provides strong evidence for a novel T2D susceptibility locus at 13q31.1 and the presence of new independent risk variants near regions (10p13 and 15q22.2) reported by previous GWAS
Identification of a Functional Genetic Variant at 16q12.1 for Breast Cancer Risk: Results from the Asia Breast Cancer Consortium
Genetic factors play an important role in the etiology of breast cancer. We carried out a multi-stage genome-wide association (GWA) study in over 28,000 cases and controls recruited from 12 studies conducted in Asian and European American women to identify genetic susceptibility loci for breast cancer. After analyzing 684,457 SNPs in 2,073 cases and 2,084 controls in Chinese women, we evaluated 53 SNPs for fast-track replication in an independent set of 4,425 cases and 1,915 controls of Chinese origin. Four replicated SNPs were further investigated in an independent set of 6,173 cases and 6,340 controls from seven other studies conducted in Asian women. SNP rs4784227 was consistently associated with breast cancer risk across all studies with adjusted odds ratios (95% confidence intervals) of 1.25 (1.20−1.31) per allele (P = 3.2×10−25) in the pooled analysis of samples from all Asian samples. This SNP was also associated with breast cancer risk among European Americans (per allele OR = 1.19, 95% CI = 1.09−1.31, P = 1.3×10−4, 2,797 cases and 2,662 controls). SNP rs4784227 is located at 16q12.1, a region identified previously for breast cancer risk among Europeans. The association of this SNP with breast cancer risk remained highly statistically significant in Asians after adjusting for previously-reported SNPs in this region. In vitro experiments using both luciferase reporter and electrophoretic mobility shift assays demonstrated functional significance of this SNP. These results provide strong evidence implicating rs4784227 as a functional causal variant for breast cancer in the locus 16q12.1 and demonstrate the utility of conducting genetic association studies in populations with different genetic architectures
- …