11 research outputs found

    Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

    Full text link
    Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%

    Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

    Full text link
    Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens via learned attention-head-specific transformations called lenses. Preliminary findings from our trained lenses indicate that attention heads play highly specialized roles in language models. The code for Attention Lens is available at github.com/msakarvadia/AttentionLens

    Investigation Into the Antidiabetic Effects of a Developed Polyherbal Nanosuspension and Its Assessment

    Get PDF
    This study focuses on the development and evaluation of a nanosuspension containing ethanolic extracts of Tinospora cordifolia and Syzygium cumini for managing Diabetes mellitus. The main objective is to create an effective polyherbal nanosuspension by combining Tinospora cordifolia and Syzygium cumini with an optimal concentration of chitosan polymer to address Diabetes mellitus. Furthermore, both in vitro and in vivo assessments of the synthesized nanosuspensions were conducted to determine the best formulation. Methods and Findings: The ethanolic extracts of the mentioned plants were obtained using a maceration technique, followed by preliminary phytochemical screening, HPTLC analysis, and FTIR-based incompatibility assessments. The nanosuspension was prepared using the ionic gelation method by varying the chitosan polymer concentration. Comprehensive in vitro assessments were carried out, including measurements of pH, viscosity, drug content, entrapment efficiency, loading capacity, and in vitro release profiles for different formulations. The formulation with the highest drug content and optimal release characteristics was selected for further analysis of particle size, zeta potential, and surface morphology. Subsequently, the antidiabetic efficacy of the polyherbal nanosuspension was evaluated using wistar albino rats. Discussion: FTIR analysis indicated no significant interaction between the drug and the polymer. The in vitro drug release and kinetic analyses suggested that the F5 formulation exhibited superior drug release and an improved release mechanism. The particle size was determined to be approximately 420nm, and SEM imaging revealed particles that were nearly spherical in shape. Stability assessments of formulation F5 demonstrated consistent physical and chemical parameters over time

    Automated Detection and Classification of Meningioma Tumor from MR Images Using Sea Lion Optimization and Deep Learning Models

    No full text
    Meningiomas are the most prevalent benign intracranial life-threatening brain tumors, with a life expectancy of a few months in the later stages, so this type of tumor in the brain image should be recognized and detected efficiently. The source of meningiomas is unknown. Radiation exposure, particularly during childhood, is the sole recognized environmental risk factor for meningiomas. The imaging technique of magnetic resonance imaging (MRI) is commonly used to detect most tumor forms as it is a non-invasive and painless method. This study introduces a CNN-HHO integrated automated identification model, which makes use of SeaLion optimization methods for improving overall network optimization. In addition to these techniques, various CNN models such as Resnet, VGG, and DenseNet have been utilized to give an overall influence of CNN with SeaLion in each methodology. Each model is tested on our benchmark dataset for accuracy, specificity, dice coefficient, MCC, and sensitivity, with DenseNet outperforming the other models with a precision of 98%. The proposed methods outperform existing alternatives in the detection of brain tumors, according to the existing experimental findings

    Deep Learning-Based BoVW–CRNN Model for Lung Tumor Detection in Nano-Segmented CT Images

    No full text
    One of the most common oncologies analyzed among people worldwide is lung malignancy. Early detection of lung malignancy helps find a suitable treatment for saving human lives. Due to its high resolution, greater transparency, and low noise and distortions, Computed Tomography (CT) images are most commonly used for processing. In this context, this research work mainly focused on the multifaceted nature of lung cancer diagnosis, a quintessential, fascinating, and risky subject of oncology. The input used here has been nano-image, enhanced with a Gabor filter and modified color-based histogram equalization. Then, the image of lung cancer was segmented by using the Guaranteed Convergence Particle Swarm Optimization (GCPSO) algorithm. A graphical user interface nano-measuring tool was designed to classify the tumor region. The Bag of Visual Words (BoVW) and a Convolutional Recurrent Neural Network (CRNN) were employed for image classification and feature extraction processes. In terms of findings, we achieved the average precision of 96.5%, accuracy of 99.35%, sensitivity of 97%, specificity of 99% and F1 score of 95.5%. With the proposed solution, the overall time required for the segmentation of images was much smaller than the existing solutions. It is also remarkable that biocompatible-based nanotechnology was developed to distinguish the malignancy region on a nanometer scale and has to be evaluated automatically. That novel method succeeds in producing a proficient, robust, and precise segmentation of lesions in nano-CT images

    ScholarBERT: Bigger is Not Always Better

    Full text link
    Transformer-based masked language models trained on general corpora, such as BERT and RoBERTa, have shown impressive performance on various downstream tasks. Increasingly, researchers are "finetuning" these models to improve performance on domain-specific tasks. Here, we report a broad study in which we applied 14 transformer-based models to 11 scientific tasks in order to evaluate how downstream performance is affected by changes along various dimensions (e.g., training data, model size, pretraining time, finetuning length). In this process, we created the largest and most diverse scientific language model to date, ScholarBERT, by training a 770M-parameter BERT model on an 221B token scientific literature dataset spanning many disciplines. Counterintuitively, our evaluation of the 14 BERT-based models (seven versions of ScholarBERT, five science-specific large language models from the literature, BERT-Base, and BERT-Large) reveals little difference in performance across the 11 science-focused tasks, despite major differences in model size and training data. We argue that our results establish an upper bound for the performance achievable with BERT-based architectures on tasks from the scientific domain.Comment: 16 pages. 4 figures. 8 table

    Use of Statins Among Patients Taking Levothyroxine: an Observational Drug Utilization Study Across Sites

    No full text
    Context: Treatment with levothyroxine (LT4) that normalize serum thyrotropin (TSH) is expected to restore lipid metabolism. Objective: To assess statin utilization in LT4-treated patients through an observational drug utilization study. Methods: Three sites were involved: (1) 10 723 outpatients placed on LT4 during 2006-2019 identified from the Clinical Research Data Warehouse of the University of Chicago; (2) similar to 1.4 million LT4 prescriptions prepared by primary care physicians during January-December 2018, identified from the IQVIA (TM) database of medical prescriptions in Brazil; (30 similar to 5.4 million patient interviews during 2009-2019, including similar to 0.32 million patients on LT4, identified from the Fleury Group database in Brazil. Results: On site 1, initiation of therapy with LT4 increased the frequency of statin utilization (19.1% vs 24.6%), which occurred similar to 1.5 years later (median 76 weeks) and, among those patients that were on statins, increased intensity of treatment by 33%, despite normalization of serum TSH levels; on site 2, after matching for sex and age, the frequency of statins prescription was higher for those patients using LT4: females, 2.1 vs 3.4% (odds ratio [OR] 1.656 [1.639-1.673]); males, 3.1 vs 4.4% (OR 1.435 [1.409-1.462]); and, on site 3, after matching for sex and age, the frequency of statin utilization was higher in those patients using LT4: females, 10 vs 18% (OR 2.02 [2.00-2.04]); males, 15 vs 25% (OR 1.92 [1.88-1.96]); all P values were <.0001. Conclusion: Prescription and utilization of statins were higher in patients taking LT4. The reasons for this association should be addressed in future studies
    corecore