78 research outputs found

    BatchEval: Towards Human-like Text Evaluation

    Full text link
    Significant progress has been made in automatic text evaluation with the introduction of large language models (LLMs) as evaluators. However, current sample-wise evaluation paradigm suffers from the following issues: (1) Sensitive to prompt design; (2) Poor resistance to noise; (3) Inferior ensemble performance with static reference. Inspired by the fact that humans treat both criterion definition and inter sample comparison as references for evaluation, we propose BatchEval, a paradigm that conducts batch-wise evaluation iteratively to alleviate the above problems. We explore variants under this paradigm and confirm the optimal settings are two stage procedure with heterogeneous batch composition strategy and decimal scoring format. Comprehensive experiments across 3 LLMs on 4 text evaluation tasks demonstrate that BatchEval outperforms state-of-the-art methods by 10.5% on Pearson correlations with only 64% API cost on average. Further analyses have been conducted to verify the robustness, generalization, and working mechanism of BatchEval.Comment: 19 pages, 9 figure

    Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data

    Full text link
    Large Language Models (LLMs) have performed well on various reasoning tasks, but their inaccessibility and numerous parameters hinder wide application in practice. One promising way is distilling the reasoning ability from LLMs to small models by the generated chain-of-thought reasoning paths. In some cases, however, LLMs may produce incorrect reasoning chains, especially when facing complex mathematical problems. Previous studies only transfer knowledge from positive samples and drop the synthesized data with wrong answers. In this work, we illustrate the merit of negative data and propose a model specialization framework to distill LLMs with negative samples besides positive ones. The framework consists of three progressive steps, covering from training to inference stages, to absorb knowledge from negative data. We conduct extensive experiments across arithmetic reasoning tasks to demonstrate the role of negative data in distillation from LLM.Comment: AAAI 202

    Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

    Full text link
    Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning. Despite bringing significant performance improvements across a variety of multi-step reasoning tasks, it is a high-cost method that requires multiple sampling with the preset size. In this paper, we propose a simple and scalable sampling process, \textbf{E}arly-Stopping \textbf{S}elf-\textbf{C}onsistency (ESC), to greatly reduce the cost of SC without sacrificing performance. On this basis, one control scheme for ESC is further derivated to dynamically choose the performance-cost balance for different tasks and models. To demonstrate ESC's effectiveness, we conducted extensive experiments on three popular categories of reasoning tasks: arithmetic, commonsense and symbolic reasoning over language models with varying scales. The empirical results show that ESC reduces the average number of sampling of chain-of-thought reasoning by a significant margin on six benchmarks, including MATH (-33.8%), GSM8K (-80.1%), StrategyQA (-76.8%), CommonsenseQA (-78.5%), Coin Flip (-84.2%) and Last Letters (-67.4%), while attaining comparable performances.Comment: ICLR 202

    Generative Dense Retrieval: Memory Can Be a Burden

    Full text link
    Generative Retrieval (GR), autoregressively decoding relevant document identifiers given a query, has been shown to perform well under the setting of small-scale corpora. By memorizing the document corpus with model parameters, GR implicitly achieves deep interaction between query and document. However, such a memorizing mechanism faces three drawbacks: (1) Poor memory accuracy for fine-grained features of documents; (2) Memory confusion gets worse as the corpus size increases; (3) Huge memory update costs for new documents. To alleviate these problems, we propose the Generative Dense Retrieval (GDR) paradigm. Specifically, GDR first uses the limited memory volume to achieve inter-cluster matching from query to relevant document clusters. Memorizing-free matching mechanism from Dense Retrieval (DR) is then introduced to conduct fine-grained intra-cluster matching from clusters to relevant documents. The coarse-to-fine process maximizes the advantages of GR's deep interaction and DR's scalability. Besides, we design a cluster identifier constructing strategy to facilitate corpus memory and a cluster-adaptive negative sampling strategy to enhance the intra-cluster mapping ability. Empirical results show that GDR obtains an average of 3.0 R@100 improvement on NQ dataset under multiple settings and has better scalability.Comment: EACL 2024 mai

    The Most Recently Discovered Carbonic Anhydrase, CA XV, Is Expressed in the Thick Ascending Limb of Henle and in the Collecting Ducts of Mouse Kidney

    Get PDF
    BACKGROUND: Carbonic anhydrases (CAs) are key enzymes for physiological pH regulation, including the process of urine acidification. Previous studies have identified seven cytosolic or membrane-bound CA isozymes in the kidney. Recently, we showed by in situ hybridization that the mRNA for the most novel CA isozyme, CA XV, is present in the renal cortex. CA XV is a unique isozyme among mammalian CAs, because it has become a pseudogene in primates even though expressed in several other species. METHODOLOGY/PRINCIPAL FINDINGS: In the present study, we raised a polyclonal antibody against recombinant mouse CA XV that was produced in a baculovirus/insect cell expression system, and the antibody was used for immunohistochemical analysis in different mouse tissues. Positive immunoreactions were found only in the kidney, where the enzyme showed a very limited distribution pattern. Parallel immunostaining experiments with several other anti-CA sera indicated that CA XV is mainly expressed in the thick ascending limb of Henle and collecting ducts, and the reactions were most prominent in the cortex and outer medulla. CONCLUSION/SIGNIFICANCE: Although other studies have proposed a role for CA XV in cell proliferation, its tightly limited distribution may point to a specialized function in the regulation of acid-base homeostasis

    31st Annual Meeting and Associated Programs of the Society for Immunotherapy of Cancer (SITC 2016) : part two

    Get PDF
    Background The immunological escape of tumors represents one of the main ob- stacles to the treatment of malignancies. The blockade of PD-1 or CTLA-4 receptors represented a milestone in the history of immunotherapy. However, immune checkpoint inhibitors seem to be effective in specific cohorts of patients. It has been proposed that their efficacy relies on the presence of an immunological response. Thus, we hypothesized that disruption of the PD-L1/PD-1 axis would synergize with our oncolytic vaccine platform PeptiCRAd. Methods We used murine B16OVA in vivo tumor models and flow cytometry analysis to investigate the immunological background. Results First, we found that high-burden B16OVA tumors were refractory to combination immunotherapy. However, with a more aggressive schedule, tumors with a lower burden were more susceptible to the combination of PeptiCRAd and PD-L1 blockade. The therapy signifi- cantly increased the median survival of mice (Fig. 7). Interestingly, the reduced growth of contralaterally injected B16F10 cells sug- gested the presence of a long lasting immunological memory also against non-targeted antigens. Concerning the functional state of tumor infiltrating lymphocytes (TILs), we found that all the immune therapies would enhance the percentage of activated (PD-1pos TIM- 3neg) T lymphocytes and reduce the amount of exhausted (PD-1pos TIM-3pos) cells compared to placebo. As expected, we found that PeptiCRAd monotherapy could increase the number of antigen spe- cific CD8+ T cells compared to other treatments. However, only the combination with PD-L1 blockade could significantly increase the ra- tio between activated and exhausted pentamer positive cells (p= 0.0058), suggesting that by disrupting the PD-1/PD-L1 axis we could decrease the amount of dysfunctional antigen specific T cells. We ob- served that the anatomical location deeply influenced the state of CD4+ and CD8+ T lymphocytes. In fact, TIM-3 expression was in- creased by 2 fold on TILs compared to splenic and lymphoid T cells. In the CD8+ compartment, the expression of PD-1 on the surface seemed to be restricted to the tumor micro-environment, while CD4 + T cells had a high expression of PD-1 also in lymphoid organs. Interestingly, we found that the levels of PD-1 were significantly higher on CD8+ T cells than on CD4+ T cells into the tumor micro- environment (p < 0.0001). Conclusions In conclusion, we demonstrated that the efficacy of immune check- point inhibitors might be strongly enhanced by their combination with cancer vaccines. PeptiCRAd was able to increase the number of antigen-specific T cells and PD-L1 blockade prevented their exhaus- tion, resulting in long-lasting immunological memory and increased median survival

    Methylation in the promoter regions of WT1, NKX6-1 and DBC1 genes in cervical cancer tissues of Uygur women in Xinjiang

    No full text
    Abstract This study aimed to explore: 1) DNA methylation in the promoter regions of Wilms tumor gene 1 (WT1), NK6 transcription factor related locus 1 gene (NKX6-1) and Deleted in bladder cancer 1 (DBC1) gene in cervical cancer tissues of Uygur women in Xinjiang, and 2) the correlation of gene methylation with the infection of HPV16/18 viruses. We detected HPV16/18 infection in 43 normal cervical tissues, 30 cervical intraepithelial neoplasia lesions (CIN) and 48 cervical cancer tissues with polymerase chain reaction (PCR) method. Methylation in the promoter regions of the WT1, NKX6-1 and DBC1 genes in the above-mentioned tissues was measured by methylation-specific PCR (MSP) and cloning sequencing. The expression level of these three genes was measured by real-time PCR (qPCR) in 10 methylation-positive cervical cancer tissues and 10 methylation-negative normal cervical tissues. We found that the infection of HPV16 in normal cervical tissues, CIN and cervical cancer tissues was 14.0, 36.7 and 66.7%, respectively. The infection of HPV18 was 0, 6.7 and 10.4%, respectively. The methylation rates of WT1, NKX6-1 and DBC1 genes were 7.0, 11.6 and 23.3% in normal cervical tissues, 36.7, 46.7 and 30.0% in CIN tissues, and 89.6, 77.1 and 85.4% in cervical cancer tissues. Furthermore, WT1, NKX6-1 and DBC1 genes were hypermethylated in the high-grade squamous intraepithelial lesion (CIN2, CIN3) and in the cervical cancer tissues with infection of HPV16/18 (both P< 0.05). The expression of WT1, NKX6-1 and DBC1 was significantly lower in the methylation-positive cervical cancer tissues than in methylation-negative normal cervical tissues. Our findings indicated that methylation in the promoter regions of WT1, NKX6-1 and DBC1 is correlated with cervical cancer tumorigenesis in Uygur women. The infection of HPV16/18 might be correlated with methylation in these genes. Gene inactivation caused by methylation might be related to the incidence and development of cervical cancer
    corecore