267 research outputs found

    Serum lactate dehydrogenase activities as systems biomarkers for 48 types of human diseases

    Get PDF
    Most human diseases are systems diseases, and systems biomarkers are better fitted for diagnostic, prognostic, and treatment monitoring purposes. To search for systems biomarker candidates, lactate dehydrogenase (LDH), a housekeeping protein expressed in all living cells, was investigated. To this end, we analyzed the serum LDH activities from 172,933 patients with 48 clinically defined diseases and 9528 healthy individuals. Based on the median values, we found that 46 out of 48 diseases, leading by acute myocardial infarction, had significantly increased (p  0.8) for hepatic encephalopathy and lung fibrosis

    Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

    Full text link
    Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.Comment: In proceedings of NeurIPS 2023; Code and model available at https://github.com/tanyuqian/cappy and https://huggingface.co/btan2/cappy-large, respectivel

    Edge-aware Hard Clustering Graph Pooling for Brain Imaging Data

    Full text link
    Graph Convolutional Networks (GCNs) can capture non-Euclidean spatial dependence between different brain regions, and the graph pooling operator in GCNs is key to enhancing the representation learning capability and acquiring abnormal brain maps. However, the majority of existing research designs graph pooling operators only from the perspective of nodes while disregarding the original edge features, in a way that not only confines graph pooling application scenarios, but also diminishes its ability to capture critical substructures. In this study, a clustering graph pooling method that first supports multidimensional edge features, called Edge-aware hard clustering graph pooling (EHCPool), is developed. EHCPool proposes the first 'Edge-to-node' score evaluation criterion based on edge features to assess node feature significance. To more effectively capture the critical subgraphs, a novel Iteration n-top strategy is further designed to adaptively learn sparse hard clustering assignments for graphs. Subsequently, an innovative N-E Aggregation strategy is presented to aggregate node and edge feature information in each independent subgraph. The proposed model was evaluated on multi-site brain imaging public datasets and yielded state-of-the-art performance. We believe this method is the first deep learning tool with the potential to probe different types of abnormal functional brain networks from data-driven perspective. Core code is at: https://github.com/swfen/EHCPool

    Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs

    Full text link
    The recent progress of AI can be largely attributed to large language models (LLMs). However, their escalating memory requirements introduce challenges for machine learning (ML) researchers and engineers. Addressing this requires developers to partition a large model to distribute it across multiple GPUs or TPUs. This necessitates considerable coding and intricate configuration efforts with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa. These tools require users' expertise in machine learning systems (MLSys), creating a bottleneck in LLM development, particularly for developers without MLSys background. In this work, we present Redco, a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development. The design of Redco emphasizes two key aspects. Firstly, to automate model parallism, our study identifies two straightforward rules to generate tensor parallel strategies for any given LLM. Integrating these rules into Redco facilitates effortless distributed LLM training and inference, eliminating the need of additional coding or complex configurations. We demonstrate the effectiveness by applying Redco on a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to the size of 66B. Secondly, we propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions, eliminating redundant and formulaic code like multi-host related processing. This mechanism proves adaptable across a spectrum of ML algorithms, from foundational language modeling to complex algorithms like meta-learning and reinforcement learning. Consequently, Redco implementations exhibit much fewer code lines compared to their official counterparts.Comment: Released under Apache License 2.0 at https://github.com/tanyuqian/redc

    Flavour by design: food-grade lactic acid bacteria improve the volatile aroma spectrum of oat milk, sunflower seed milk, pea milk, and faba milk towards improved flavour and sensory perception

    Get PDF
    Background The global market of plant-based milk alternatives is continually growing. Flavour and taste have a key impact on consumers’ selection of plant-based beverages. Unfortunately, natural plant milks have only limited acceptance. Their typically bean-like and grassy notes are perceived as“of-favours” by consumers, while preferred fruity, buttery, and cheesy notes are missing. In this regard, fermentation of plant milk by lactic acid bacteria (LAB) appears to be an appealing option to improve aroma and taste. Results In this work, we systematically studied LAB fermentation of plant milk. For this purpose, we evaluated 15 food-approved LAB strains to ferment 4 diferent plant milks: oat milk (representing cereal-based milk), sunfower seed milk (representing seed-based milk), and pea and faba milk (representing legume-based milk). Using GC‒MS analysis, favour changes during anaerobic fermentations were studied in detail. These revealed species-related and plant milkrelated diferences and highlighted several well-performing strains delivered a range of benefcial favour changes. A developed data model estimated the impact of individual favour compounds using sensory scores and predicted the overall favour note of fermented and nonfermented samples. Selected sensory perception tests validated the model and allowed us to bridge compositional changes in the favour profle with consumer response. Conclusion Specifc strain-milk combinations provided quite diferent favour notes. This opens further developments towards plant-based products with improved favour, including cheesy and buttery notes, as well as other innovative products in the future. S. thermophilus emerged as a well-performing strain that delivered preferred buttery notes in all tested plant milks. The GC‒MS-based data model was found to be helpful in predicting sensory perception, and its further refnement and application promise enhanced potential to upgrade fermentation approaches to favour-by-design strategies

    Equivariant Similarity for Vision-Language Foundation Models

    Full text link
    This study explores the concept of equivariance in vision-language foundation models (VLMs), focusing specifically on the multimodal similarity function that is not only the major training objective but also the core delivery to support downstream tasks. Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes. This allows VLMs to generalize better to nuanced and unseen multimodal compositions. However, modeling equivariance is challenging as the ground truth of semantic change is difficult to collect. For example, given an image-text pair about a dog, it is unclear to what extent the similarity changes when the pixel is changed from dog to cat? To this end, we propose EqSim, a regularization loss that can be efficiently calculated from any two matched training pairs and easily pluggable into existing image-text retrieval fine-tuning. Meanwhile, to further diagnose the equivariance of VLMs, we present a new challenging benchmark EqBen. Compared to the existing evaluation sets, EqBen is the first to focus on "visual-minimal change". Extensive experiments show the lack of equivariance in current VLMs and validate the effectiveness of EqSim. Code is available at https://github.com/Wangt-CN/EqBen.Comment: Accepted by ICCV'23 (Oral); Add evaluation on MLL
    • …
    corecore