237 research outputs found

    Enabling Large Language Models to Learn from Rules

    Full text link
    Large language models (LLMs) have shown incredible performance in completing various real-world tasks. The current knowledge learning paradigm of LLMs is mainly based on learning from examples, in which LLMs learn the internal rule implicitly from a certain number of supervised examples. However, the learning paradigm may not well learn those complicated rules, especially when the training examples are limited. We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules. That is, humans can grasp the new tasks or knowledge quickly and generalize well given only a detailed rule and a few optional examples. Therefore, in this paper, we aim to explore the feasibility of this new learning paradigm, which encodes the rule-based knowledge into LLMs. We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules and then explicitly encode the knowledge into LLMs' parameters by learning from the above in-context signals produced inside the model. Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.Comment: In progres

    Using Dempster-Shafer’s evidence theory for query expansion based on freebase knowledge

    Get PDF
    Query expansion is generally a useful technique in improving search performance. However, some expanded query terms obtained by traditional statistical methods (e.g., pseudo-relevance feedback) may not be relevant to the user's information need, while some relevant terms may not be contained in the feedback documents at all. Recent studies utilize external resources to detect terms that are related to the query, and then adopt these terms in query expansion. In this paper, we present a study in the use of Freebase, which is an open source general-purpose ontology, as a source for deriving expansion terms. FreeBase provides a graph-based model of human knowledge, from which a rich and multi-step structure of instances related to the query concept can be extracted, as a complement to the traditional statistical approaches to query expansion. We propose a novel method, based on the well-principled Dempster-Shafer's (D-S) evidence theory, to measure the certainty of expansion terms from the Freebase structure. The expanded query model is then combined with a state of the art statistical query expansion model - the Relevance Model (RM3). Experiments show that the proposed method achieves significant improvements over RM3

    Identification of New Genetic Risk Variants for Type 2 Diabetes

    Get PDF
    Although more than 20 genetic susceptibility loci have been reported for type 2 diabetes (T2D), most reported variants have small to moderate effects and account for only a small proportion of the heritability of T2D, suggesting that the majority of inter-person genetic variation in this disease remains to be determined. We conducted a multistage, genome-wide association study (GWAS) within the Asian Consortium of Diabetes to search for T2D susceptibility markers. From 590,887 SNPs genotyped in 1,019 T2D cases and 1,710 controls selected from Chinese women in Shanghai, we selected the top 2,100 SNPs that were not in linkage disequilibrium (r2<0.2) with known T2D loci for in silico replication in three T2D GWAS conducted among European Americans, Koreans, and Singapore Chinese. The 5 most promising SNPs were genotyped in an independent set of 1,645 cases and 1,649 controls from Shanghai, and 4 of them were further genotyped in 1,487 cases and 3,316 controls from 2 additional Chinese studies. Consistent associations across all studies were found for rs1359790 (13q31.1), rs10906115 (10p13), and rs1436955 (15q22.2) with P-values (per allele OR, 95%CI) of 6.49×10−9 (1.15, 1.10–1.20), 1.45×10−8 (1.13, 1.08–1.18), and 7.14×10−7 (1.13, 1.08–1.19), respectively, in combined analyses of 9,794 cases and 14,615 controls. Our study provides strong evidence for a novel T2D susceptibility locus at 13q31.1 and the presence of new independent risk variants near regions (10p13 and 15q22.2) reported by previous GWAS

    Identification of a Functional Genetic Variant at 16q12.1 for Breast Cancer Risk: Results from the Asia Breast Cancer Consortium

    Get PDF
    Genetic factors play an important role in the etiology of breast cancer. We carried out a multi-stage genome-wide association (GWA) study in over 28,000 cases and controls recruited from 12 studies conducted in Asian and European American women to identify genetic susceptibility loci for breast cancer. After analyzing 684,457 SNPs in 2,073 cases and 2,084 controls in Chinese women, we evaluated 53 SNPs for fast-track replication in an independent set of 4,425 cases and 1,915 controls of Chinese origin. Four replicated SNPs were further investigated in an independent set of 6,173 cases and 6,340 controls from seven other studies conducted in Asian women. SNP rs4784227 was consistently associated with breast cancer risk across all studies with adjusted odds ratios (95% confidence intervals) of 1.25 (1.20−1.31) per allele (P = 3.2×10−25) in the pooled analysis of samples from all Asian samples. This SNP was also associated with breast cancer risk among European Americans (per allele OR  = 1.19, 95% CI  = 1.09−1.31, P = 1.3×10−4, 2,797 cases and 2,662 controls). SNP rs4784227 is located at 16q12.1, a region identified previously for breast cancer risk among Europeans. The association of this SNP with breast cancer risk remained highly statistically significant in Asians after adjusting for previously-reported SNPs in this region. In vitro experiments using both luciferase reporter and electrophoretic mobility shift assays demonstrated functional significance of this SNP. These results provide strong evidence implicating rs4784227 as a functional causal variant for breast cancer in the locus 16q12.1 and demonstrate the utility of conducting genetic association studies in populations with different genetic architectures
    corecore