1,456 research outputs found

    Merging by Matching Models in Task Parameter Subspaces

    Full text link
    Model merging aims to cheaply combine individual task-specific models into a single multitask model. In this work, we view past merging methods as leveraging different notions of a ''task parameter subspace'' in which models are matched before being merged. We connect the task parameter subspace of a given model to its loss landscape and formalize how this approach to model merging can be seen as solving a linear system of equations. While past work has generally been limited to linear systems that have a closed-form solution, we consider using the conjugate gradient method to find a solution. We show that using the conjugate gradient method can outperform closed-form solutions, enables merging via linear systems that are otherwise intractable to solve, and flexibly allows choosing from a wide variety of initializations and estimates for the ''task parameter subspace''. We ultimately demonstrate that our merging framework called ''Matching Models in their Task Parameter Subspace'' (MaTS) achieves state-of-the-art results in multitask and intermediate-task model merging. We release all of the code and checkpoints used in our work at https://github.com/r-three/mats.Comment: TML

    Evaluating the Factual Consistency of Large Language Models Through News Summarization

    Full text link
    While large language models (LLMs) have proven to be effective on a large variety of tasks, they are also known to hallucinate information. To measure whether an LLM prefers factually consistent continuations of its input, we propose a new benchmark called FIB(Factual Inconsistency Benchmark) that focuses on the task of summarization. Specifically, our benchmark involves comparing the scores an LLM assigns to a factually consistent versus a factually inconsistent summary for an input news article. For factually consistent summaries, we use human-written reference summaries that we manually verify as factually consistent. To generate summaries that are factually inconsistent, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent. A model's factual consistency is then measured according to its accuracy, i.e.\ the proportion of documents where it assigns a higher score to the factually consistent summary. To validate the usefulness of FIB, we evaluate 23 large language models ranging from 1B to 176B parameters from six different model families including BLOOM and OPT. We find that existing LLMs generally assign a higher score to factually consistent summaries than to factually inconsistent summaries. However, if the factually inconsistent summaries occur verbatim in the document, then LLMs assign a higher score to these factually inconsistent summaries than factually consistent summaries. We validate design choices in our benchmark including the scoring method and source of distractor summaries. Our code and benchmark data can be found at https://github.com/r-three/fib

    Rapid PCR detection of group a streptococcus from flocked throat swabs: A retrospective clinical study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Rapid diagnosis of GAS pharyngitis may improve patient care by ensuring that patients with GAS pharyngitis are treated quickly and also avoiding unnecessary use of antibiotics in those without GAS infection. Very few molecular methods for detection of GAS in clinical throat swab specimens have been described.</p> <p>Methods</p> <p>We performed a study of a laboratory-developed internally-controlled rapid Group A streptococcus (GAS) PCR assay using flocked swab throat specimens. We compared the GAS PCR assay to GAS culture results using a collection of archived throat swab samples obtained during a study comparing the performance of conventional and flocked throat swabs.</p> <p>Results</p> <p>The sensitivity of the GAS PCR assay as compared to the reference standard was 96.0% (95% CI 90.1% to 98.4%), specificity 98.6% (95% CI 95.8% to 99.5%), positive predictive value (PPV) 96.9% (95% CI 91.4% to 99.0%) and negative predictive value (NPV) of 98.1% (95% CI 95.2% to 99.2%). For conventional swab cultures, sensitivity was 96.0% (95% CI 90.1% to 98.4%), specificity 100% (95% CI 98.2% to 100%), PPV 100%, (95% CI 96.1% to 100%) and NPV 98.1% (95% CI 95.2% to 99.3%)</p> <p>Conclusions</p> <p>In this retrospective study, the GAS PCR assay appeared to perform as well as conventional throat swab culture, the current standard of practice. Since the GAS PCR assay, including DNA extraction, can be performed in approximately 1 hour, prospective studies of this assay are warranted to evaluate the clinical impact of the assay on management of patients with pharyngitis.</p

    In vitro flow experiments for determination of optimal geometry of total cavopulmonary connection for surgical repair of children with functional single ventricle

    Get PDF
    Objectives.This study sought to evaluate the effect of offsetting cavopulmonary connections at varying pulmonary flow ratios to determine the optimal geometry of the connection.Background.Previous investigators have demonstrated energy conservation within the streamlined contours of the total cavopulmonary connection compared with that of the atriopulmonary connection. However, their surgical design of connecting the two cavae directly opposite each other may result in high energy losses. Others have introduced a unidirectional connection with some advantages but with concerns about the formation of arteriovenous malformation in the lung excluded from hepatic venous return. Thus, an optimal surgical design has not been determined.Methods.In the present models, the caval connections were offset through a range of 0.0 to 2.0 diameters by 0.5 superior cava diameter increments. Flow ratios were fixed for superior and inferior cavae and varied for right and left pulmonary arteries as 70:30, 60:40, 50:50, 40:60 and 30:70 to stimulate varying lung resistance. Pressure measurements and flow visualization were done at steady flows of 2, 4 and 6 liters/min to simulate rest and exercise.Results.Our data show that the energy losses at the 0.0diameter offset were double the losses of the 1.0 and 1.5 diameters, which had minimal energy losses. This result was attributable to chaotic patterns seen on flow visualization in the 0.0-diameter offset. Energy savings were more evident at the 50:50 right/left pulmonary artery ratio. Energy losses increased with increased total flow rates.Conclusions.The results strongly suggest the incorporation of caval offsets in future total cavopulmonary connections

    Repair of injured plasma membrane by rapid Ca2+-dependent endocytosis

    Get PDF
    Ca2+ influx through plasma membrane lesions triggers a rapid repair process that was previously shown to require the exocytosis of lysosomal organelles (Reddy, A., E. Caler, and N. Andrews. 2001. Cell. 106:157–169). However, how exocytosis leads to membrane resealing has remained obscure, particularly for stable lesions caused by pore-forming proteins. In this study, we show that Ca2+-dependent resealing after permeabilization with the bacterial toxin streptolysin O (SLO) requires endocytosis via a novel pathway that removes SLO-containing pores from the plasma membrane. We also find that endocytosis is similarly required to repair lesions formed in mechanically wounded cells. Inhibition of lesion endocytosis (by sterol depletion) inhibits repair, whereas enhancement of endocytosis through disruption of the actin cytoskeleton facilitates resealing. Thus, endocytosis promotes wound resealing by removing lesions from the plasma membrane. These findings provide an important new insight into how cells protect themselves not only from mechanical injury but also from microbial toxins and pore-forming proteins produced by the immune system

    A Comparative Study of Factors Shaping E-Bussiness Strategies in U.S. And Asian Financial Services Companies

    Get PDF
    This abstract presents the preliminary findings of a study comparing factors that impact the adoption of e-business strategies by financial services companies in the United States and some East Asian countries (Taiwan, Peoples Republic of China, Hong Kong, South Korea and Japan). The goal of our research was to explore the factors shaping e-business strategies intended to provide integrated and international financial services. We focused on 3 main aspects, · Technology and Infrastructure used to support ebusiness, · Political and Economic issues shaping the regulatory environment for financial services in each economy, and, · Cultural issues related to e-business in each country. Following a review of the literature covering each of the above aspects, primary data was collected through the form of structured interviews with managers involved with ebusiness strategy decisions at each company we visited. In each interview, the intent was to gain an overall perspective of what it takes for a financial institution to implement an effective e-business strategy in the context within which it must operate. Three types of companies were chosen to be part of the study, these included: · Financial institutions were studied to gain insight into the motivations behind current e-business strategies, as well as, future directions. The companies in this category include Citibank, Sanwa Bank and Hanvit Bank. · Technology/solution providers were studied to learn about the issues financial services companies face in selecting the appropriate technology and the challenges in integrating the chosen solution. The companies in this category include BizCurrency.com and NTT Docomo. · Consulting companies, including Bain and Co as well as Computer Sciences Corporation, were studied to understand the direction in which the financial services industry in each country was headed with respect to ebusiness strategies
    corecore