19 research outputs found

    Pseudointelligence: A Unifying Framework for Language Model Evaluation

    Full text link
    With large language models surpassing human performance on an increasing number of benchmarks, we must take a principled approach for targeted evaluation of model capabilities. Inspired by pseudorandomness, we propose pseudointelligence, which captures the maxim that "(perceived) intelligence lies in the eye of the beholder". That is, that claims of intelligence are meaningful only when their evaluator is taken into account. Concretely, we propose a complexity-theoretic framework of model evaluation cast as a dynamic interaction between a model and a learned evaluator. We demonstrate that this framework can be used to reason about two case studies in language model evaluation, as well as analyze existing evaluation methods.Comment: EMNLP 2023 Finding

    Grokking of Hierarchical Structure in Vanilla Transformers

    Full text link
    For humans, language production and comprehension is sensitive to the hierarchical structure of sentences. In natural language processing, past work has questioned how effectively neural sequence models like transformers capture this hierarchical structure when generalizing to structurally novel inputs. We show that transformer language models can learn to generalize hierarchically after training for extremely long periods -- far beyond the point when in-domain accuracy has saturated. We call this phenomenon \emph{structural grokking}. On multiple datasets, structural grokking exhibits inverted U-shaped scaling in model depth: intermediate-depth models generalize better than both very deep and very shallow transformers. When analyzing the relationship between model-internal properties and grokking, we find that optimal depth for grokking can be identified using the tree-structuredness metric of \citet{murty2023projections}. Overall, our work provides strong evidence that, with extended training, vanilla transformers discover and use hierarchical structure.Comment: ACL 202

    Characterizing Intrinsic Compositionality in Transformers with Tree Projections

    Full text link
    When trained on language data, do transformers learn some arbitrary computation that utilizes the full capacity of the architecture or do they learn a simpler, tree-like computation, hypothesized to underlie compositional meaning systems like human languages? There is an apparent tension between compositional accounts of human language understanding, which are based on a restricted bottom-up computational process, and the enormous success of neural models like transformers, which can route information arbitrarily between different parts of their input. One possibility is that these models, while extremely flexible in principle, in practice learn to interpret language hierarchically, ultimately building sentence representations close to those predictable by a bottom-up, tree-structured model. To evaluate this possibility, we describe an unsupervised and parameter-free method to \emph{functionally project} the behavior of any transformer into the space of tree-structured networks. Given an input sentence, we produce a binary tree that approximates the transformer's representation-building process and a score that captures how "tree-like" the transformer's behavior is on the input. While calculation of this score does not require training any additional models, it provably upper-bounds the fit between a transformer and any tree-structured approximation. Using this method, we show that transformers for three different tasks become more tree-like over the course of training, in some cases unsupervisedly recovering the same trees as supervised parsers. These trees, in turn, are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.Comment: Fixed title and metadat

    Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

    Full text link
    Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism. Consequently, Transformer language models poorly capture long-tail recursive structure and exhibit sample-inefficient syntactic generalization. This work introduces Pushdown Layers, a new self-attention layer that models recursive state via a stack tape that tracks estimated depths of every token in an incremental parse of the observed prefix. Transformer LMs with Pushdown Layers are syntactic language models that autoregressively and synchronously update this stack tape as they predict new tokens, in turn using the stack tape to softly modulate attention over tokens -- for instance, learning to "skip" over closed constituents. When trained on a corpus of strings annotated with silver constituency parses, Transformers equipped with Pushdown Layers achieve dramatically better and 3-5x more sample-efficient syntactic generalization, while maintaining similar perplexities. Pushdown Layers are a drop-in replacement for standard self-attention. We illustrate this by finetuning GPT2-medium with Pushdown Layers on an automatically parsed WikiText-103, leading to improvements on several GLUE text classification tasks.Comment: Accepted at EMNLP 2023 (Long Papers

    Deciphering the antagonistic effect of Streptomyces spp. and host-plant resistance induction against charcoal rot of sorghum

    Get PDF
    Two strains each of Streptomyces albus (CAI-17 and KAI-27) and Streptomyces griseus (KAI-26 and MMA-32) and one strain of Streptomyces cavourensis (SAI-13) previously reported to have plant growth-promotion activity in chickpea, rice and sorghum were evaluated for their antagonistic potential against Macrophomina phaseolina, which causes charcoal rot in sorghum. The antagonistic potential of these strains against M. phaseolina was assessed through dual culture assay, metabolite production assay, blotter paper assay in greenhouse and field disease screens. In both dual culture and metabolite production assays, the selected strains significantly inhibited the growth of M. phaseolina (63–74%). In the blotter paper assay, all the five strains of Streptomyces spp. inhibited the pathogen (80–90%). When these five strains were tested for their antagonistic potential under the greenhouse (two times) and field (two seasons) conditions by toothpick method of inoculation, significant differences were observed for charcoal rot severity. Principal component analysis capturing 91.3% phenotypic variations, revealed that the shoot samples treated with both Streptomyces and the pathogen exhibited significantly enhanced antioxidant parameters including superoxide dismutase, catalase, ascorbate peroxidase, guaiacol peroxidase, glutathione reductase, phenylalanine ammonia-lyase, polyphenol oxidase, and total phenolic contents when compared to shoot samples treated with only M. phaseolina. Scanning electron microscope analysis revealed that the phloem and xylem tissues of the Streptomyces treated stem samples were intact compared to that of pathogen inoculated plants. This study indicated that the selected strains of Streptomyces spp. have the potential for biological control of charcoal rot disease in sorghum

    Identification and Characterization of a Streptomyces albus Strain and Its Secondary Metabolite Organophosphate against Charcoal Rot of Sorghum

    Get PDF
    Streptomycesalbus strain CAI-21 has been previously reported to have plant growth-promotion abilities in chickpea, pigeonpea, rice, and sorghum. The strain CAI-21 and its secondary metabolite were evaluated for their biocontrol potential against charcoal rot disease in sorghum caused by Macrophomina phaseolina. Results exhibited that CAI-21 significantly inhibited the growth of the pathogen, M. phaseolina, in dual-culture (15 mm; zone of inhibition), metabolite production (74% inhibition), and blotter paper (90% inhibition) assays. When CAI-21 was tested for its biocontrol potential under greenhouse and field conditions following inoculation of M. phaseolina by toothpick method, it significantly reduced the number of internodes infected (75% and 45% less, respectively) and length of infection (75% and 51% less, respectively) over the positive control (only M. phaseolina inoculated) plants. Under greenhouse conditions, scanning electron microscopic analysis showed that the phloem and xylem tissues of the CAI-21-treated shoot samples were intact compared to those of the diseased stem samples. The culture filtrate of the CAI-21 was purified by various chromatographic techniques, and the active compound was identified as “organophosphate” by NMR and MS. The e�cacy of organophosphate was found to inhibit the growth of M. phaseolina in the poisoned food technique. This study indicates that S. albus CAI-21 and its active metabolite organophosphate have the potential to control charcoal rot in sorghum

    Discovering the Language of Actions

    No full text
    This thesis takes a look at discovering language-like discrete infinities for actions. How can a stream of continuous data be parsed into skills/concepts and can we tie the decision of what may be the right set of skills with the problem of generating plans over a continuous action space as in the original stream of data? Can we utilize supervision from aligning parallel language instructions to scaffold the discovery of these named primitives of actions from interactions? Here, we present a framework for learning hierarchical policies from demonstrations, using sparse natural language annotations to guide the discovery of reusable skills for autonomous decision-making. It is formulated as a generative model of action sequences in which goals generate sequences of high-level subtask descriptions, and these descriptions generate sequences of low-level actions. The thesis describes how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks, using only a small number of seed annotations to ground language in action. In trained models, the space of natural language commands indexes a combinatorial library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals. The approach is evaluated in the ALFRED household simulation environment, providing natural language annotations for only 10% of demonstrations. It completes more than twice as many tasks as a standard approach to learning from demonstrations, matching the performance of instruction following models with access to ground-truth plans during both training and evaluation. 1 1Code, data, and additional visualizations are available at https://sites.google.com/view/ skill-induction-latent-lang/.S.M

    Pneumatosis intestinalis: Cost paid for rheumatoid arthritis treatment

    No full text
    A 75-year-old woman with rheumatoid arthritis on rituximab presented with a 1-week history of constipation and abdominal distension. Subsequent workup showed presence of air in the bowel wall without perforation initially. Due to positive blood cultures, worsening leucocytosis and high suspicion for perforation, an exploratory laparotomy was performed revealing necrotic bowel, walled off perforation and abscess. Patient underwent right hemicolectomy with diversion loop ileostomy. Clinicians must recognise that monoclonal antibodies like rituximab can mask signs of inflammation and therefore should maintain a high index of suspicion for intestinal perforation when evaluating patients with minimal symptoms and pneumatosis intestinalis

    Short-term effects of renal transplantation on coronary artery calcification: A prospective study

    No full text
    Cardiovascular disease is a leading cause of mortality in renal transplant recipients. Coronary artery calcification (CAC) has been found to have good correlation with atherosclerosis and cardiovascular morbidity. The objective of our study was to assess the prevalence of CAC and the long-term effects of renal transplantation on CAC and carotid intima-medial thickness (CIMT) in Indian renal transplant recipients. Twenty-eight renal transplant recipients were included in this prospective study. Dual-source computed tomography and calcium scoring using Agatston′s method and CIMT measurement were performed at the time of transplant and then repeated at six and 12 months after transplantation. The prevalence of CAC in our study patients was low (32%), probably because they were young, had been on dialysis for a short duration and had undergone live-related renal transplant. An overall improvement in biochemical parameters was observed after transplantation. Patients with zero baseline calcium score did not show progression. Patients with baseline calcium score more than zero showed initial progression at 6 months and no further progression afterwards. There was good correlation between CIMT and CAC score. Our study suggests that renal transplantation does not reverse the calcification but appears to decrease the rate of progression in the long term
    corecore