22 research outputs found

    Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models

    Full text link
    Multi-task learning (MTL), instruction tuning, and prompting have recently been shown to improve the generalizability of large language models to new tasks. However, the benefits of such methods are less well-documented in smaller language models, with some studies finding contradictory results. In this work, we explore and isolate the effects of (i) model size, (ii) general purpose MTL, (iii) in-domain MTL, (iv) instruction tuning, and (v) few-shot fine-tuning for models with fewer than 500 million parameters. Our experiments in the zero-shot setting demonstrate that models gain 31% relative improvement, on average, from general purpose MTL, with an additional 37.6% relative gain from in-domain MTL. Contradictory to prior works on large models, we find that instruction tuning provides a modest 2% performance improvement for small models

    STOP: A dataset for Spoken Task Oriented Semantic Parsing

    Full text link
    End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems. Initial experimentation show end-to-end SLU models performing slightly worse than their cascaded counterparts, which we hope encourages future work in this direction

    DEVELOPMENT AND APPLICATION OF A REDUCED ORDER MATHEMATICAL FRAMEWORK TO UNRAVEL THE COMPLEXITY OF TRAUMA INDUCED COAGULOPATHY

    Get PDF
    Trauma is the leading cause of death and disability in United States for both children and adults. In response to trauma, the body unleashes a set of coupled programs that affect the functioning of vascular, immune and autonomous nervous systems. In pathological cases, the integrated output of these programs can result in coagulopathy, systemic in- flammatory response syndrome (SIRS), multiple organ dysfunction syndrome (MODS) and potentially even death. Nearly 35%-40% of trauma deaths occur due to uncontrolled hemorrhage resulting from trauma-induced coagulopathy (TIC). TIC also plays an impor- tant role in modulating inflammation, organ dysfunction and increased susceptibility to sepsis. Clinical trials for treatment strategies targeting TIC have met with limited success. The interlinked nature of coagulant and inflammatory responses, along with patient spe- cific physiological variability, make the treatment of TIC challenging. Understanding TIC requires an integrated multi-scale modeling framework which describes the relevant bio- chemical networks within the context of the whole-body. Given the complexity and size, embedding large, non-linear models of biochemical networks into a whole body model creates a significant computational challenge. Thus an objective of this work is to develop a framework that reduces the complexity of high-dimensional mathematical models. We apply this framework to model biochem- ical networks that are important in TIC. We first investigate the dynamics of coagulation and understand the impact of protein C pathway on thrombin generation. Thereafter we use this reduced order modeling technique to model complement and fibrinolysis. We identify targets of therapeutic importance in complement and mechanisms that control clot degradation in fibrinolysis. We show that we can capture the dynamics of these com- plex but varied systems using the reduced order modeling framework. In addition, we address the problem of training high-dimensional, non-linear models of biological systems. Traditional gradient based methods often fail due to convergence to a local optima or due to the lack of gradient knowledge. We present a novel optimiza- tion method that is based on evolutionary algorithms to obtain near optimal parameters within a limited number of function evaluations. We demonstrate that this method ob- tains optimal solutions on a wide array of non-linear models, faster than existing meta heuristic methods. Taken together this work provides a methodology to rapidly investi- gate complex biochemical systems by simplifying the model design and experimentation processes

    Dynamic Modeling of the Human Coagulation Cascade Using Reduced Order Effective Kinetic Models

    No full text
    In this study, we present a novel modeling approach which combines ordinary differential equation (ODE) modeling with logical rules to simulate an archetype biochemical network, the human coagulation cascade. The model consisted of five differential equations augmented with several logical rules describing regulatory connections between model components, and unmodeled interactions in the network. This formulation was more than an order of magnitude smaller than current coagulation models, because many of the mechanistic details of coagulation were encoded as logical rules. We estimated an ensemble of likely model parameters (N = 20) from in vitro extrinsic coagulation data sets, with and without inhibitors, by minimizing the residual between model simulations and experimental measurements using particle swarm optimization (PSO). Each parameter set in our ensemble corresponded to a unique particle in the PSO. We then validated the model ensemble using thrombin data sets that were not used during training. The ensemble predicted thrombin trajectories for conditions not used for model training, including thrombin generation for normal and hemophilic coagulation in the presence of platelets (a significant unmodeled component). We then used flux analysis to understand how the network operated in a variety of conditions, and global sensitivity analysis to identify which parameters controlled the performance of the network. Taken together, the hybrid approach produced a surprisingly predictive model given its small size, suggesting the proposed framework could also be used to dynamically model other biochemical networks, including intracellular metabolic networks, gene expression programs or potentially even cell free metabolic systems

    Dynamic Modeling of Cell-Free Biochemical Networks Using Effective Kinetic Models

    No full text
    Cell-free systems offer many advantages for the study, manipulation and modeling of metabolism compared to in vivo processes. Many of the challenges confronting genome-scale kinetic modeling can potentially be overcome in a cell-free system. For example, there is no complex transcriptional regulation to consider, transient metabolic measurements are easier to obtain, and we no longer have to consider cell growth. Thus, cell-free operation holds several significant advantages for model development, identification and validation. Theoretically, genome-scale cell-free kinetic models may be possible for industrially important organisms, such as E. coli, if a simple, tractable framework for integrating allosteric regulation with enzyme kinetics can be formulated. Toward this unmet need, we present an effective biochemical network modeling framework for building dynamic cell-free metabolic models. The key innovation of our approach is the integration of simple effective rules encoding complex allosteric regulation with traditional kinetic pathway modeling. We tested our approach by modeling the time evolution of several hypothetical cell-free metabolic networks. We found that simple effective rules, when integrated with traditional enzyme kinetic expressions, captured complex allosteric patterns such as ultrasensitivity or non-competitive inhibition in the absence of mechanistic information. Second, when integrated into network models, these rules captured classic regulatory patterns such as product-induced feedback inhibition. Lastly, we showed, at least for the network architectures considered here, that we could simultaneously estimate kinetic parameters and allosteric connectivity from synthetic data starting from an unbiased collection of possible allosteric structures using particle swarm optimization. However, when starting with an initial population that was heavily enriched with incorrect structures, our particle swarm approach could converge to an incorrect structure. While only an initial proof-of-concept, the framework presented here could be an important first step toward genome-scale cell-free kinetic modeling of the biosynthetic capacity of industrially important organisms

    Reduced order modeling and analysis of the human complement system

    No full text
    <div><p>Complement is an important pathway in innate immunity, inflammation, and many disease processes. However, despite its importance, there are few validated mathematical models of complement activation. In this study, we developed an ensemble of experimentally validated reduced order complement models. We combined ordinary differential equations with logical rules to produce a compact yet predictive model of complement activation. The model, which described the lectin and alternative pathways, was an order of magnitude smaller than comparable models in the literature. We estimated an ensemble of model parameters from <i>in vitro</i> dynamic measurements of the C3a and C5a complement proteins. Subsequently, we validated the model on unseen C3a and C5a measurements not used for model training. Despite its small size, the model was surprisingly predictive. Global sensitivity and robustness analysis suggested complement was robust to any single therapeutic intervention. Only the simultaneous knockdown of both C3 and C5 consistently reduced C3a and C5a formation from all pathways. Taken together, we developed a validated mathematical model of complement activation that was computationally inexpensive, and could easily be incorporated into pre-existing or new pharmacokinetic models of immune system function. The model described experimental data, and predicted the need for multiple points of therapeutic intervention to fully disrupt complement activation.</p></div
    corecore