281 research outputs found
Serum lactate dehydrogenase activities as systems biomarkers for 48 types of human diseases
Most human diseases are systems diseases, and systems biomarkers are better fitted for diagnostic, prognostic, and treatment monitoring purposes. To search for systems biomarker candidates, lactate dehydrogenase (LDH), a housekeeping protein expressed in all living cells, was investigated. To this end, we analyzed the serum LDH activities from 172,933 patients with 48 clinically defined diseases and 9528 healthy individuals. Based on the median values, we found that 46 out of 48 diseases, leading by acute myocardial infarction, had significantly increased (pââ0.8) for hepatic encephalopathy and lung fibrosis
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in
multi-tasking under a unified instruction-following paradigm, where they also
exhibit remarkable generalization abilities to unseen tasks. Despite their
impressive performance, these LLMs, with sizes ranging from several billion to
hundreds of billions of parameters, demand substantial computational resources,
making their training and inference expensive and inefficient. Furthermore,
adapting these models to downstream applications, particularly complex tasks,
is often unfeasible due to the extensive hardware requirements for finetuning,
even when utilizing parameter-efficient approaches such as prompt tuning.
Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and
FLAN-PaLM-540B, are not publicly accessible, severely limiting their
customization potential. To address these challenges, we introduce a pretrained
small scorer, Cappy, designed to enhance the performance and efficiency of
multi-task LLMs. With merely 360 million parameters, Cappy functions either
independently on classification tasks or serve as an auxiliary component for
LLMs, boosting their performance. Moreover, Cappy enables efficiently
integrating downstream supervision without requiring LLM finetuning nor the
access to their parameters. Our experiments demonstrate that, when working
independently on 11 language understanding tasks from PromptSource, Cappy
outperforms LLMs that are several orders of magnitude larger. Besides, on 45
complex tasks from BIG-Bench, Cappy boosts the performance of the advanced
multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to
cooperate with other LLM adaptations, including finetuning and in-context
learning, offering additional performance enhancement.Comment: In proceedings of NeurIPS 2023; Code and model available at
https://github.com/tanyuqian/cappy and
https://huggingface.co/btan2/cappy-large, respectivel
Edge-aware Hard Clustering Graph Pooling for Brain Imaging Data
Graph Convolutional Networks (GCNs) can capture non-Euclidean spatial
dependence between different brain regions, and the graph pooling operator in
GCNs is key to enhancing the representation learning capability and acquiring
abnormal brain maps. However, the majority of existing research designs graph
pooling operators only from the perspective of nodes while disregarding the
original edge features, in a way that not only confines graph pooling
application scenarios, but also diminishes its ability to capture critical
substructures. In this study, a clustering graph pooling method that first
supports multidimensional edge features, called Edge-aware hard clustering
graph pooling (EHCPool), is developed. EHCPool proposes the first
'Edge-to-node' score evaluation criterion based on edge features to assess node
feature significance. To more effectively capture the critical subgraphs, a
novel Iteration n-top strategy is further designed to adaptively learn sparse
hard clustering assignments for graphs. Subsequently, an innovative N-E
Aggregation strategy is presented to aggregate node and edge feature
information in each independent subgraph. The proposed model was evaluated on
multi-site brain imaging public datasets and yielded state-of-the-art
performance. We believe this method is the first deep learning tool with the
potential to probe different types of abnormal functional brain networks from
data-driven perspective. Core code is at: https://github.com/swfen/EHCPool
Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs
The recent progress of AI can be largely attributed to large language models
(LLMs). However, their escalating memory requirements introduce challenges for
machine learning (ML) researchers and engineers. Addressing this requires
developers to partition a large model to distribute it across multiple GPUs or
TPUs. This necessitates considerable coding and intricate configuration efforts
with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa.
These tools require users' expertise in machine learning systems (MLSys),
creating a bottleneck in LLM development, particularly for developers without
MLSys background. In this work, we present Redco, a lightweight and
user-friendly tool crafted to automate distributed training and inference for
LLMs, as well as to simplify ML pipeline development. The design of Redco
emphasizes two key aspects. Firstly, to automate model parallism, our study
identifies two straightforward rules to generate tensor parallel strategies for
any given LLM. Integrating these rules into Redco facilitates effortless
distributed LLM training and inference, eliminating the need of additional
coding or complex configurations. We demonstrate the effectiveness by applying
Redco on a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to
the size of 66B. Secondly, we propose a mechanism that allows for the
customization of diverse ML pipelines through the definition of merely three
functions, eliminating redundant and formulaic code like multi-host related
processing. This mechanism proves adaptable across a spectrum of ML algorithms,
from foundational language modeling to complex algorithms like meta-learning
and reinforcement learning. Consequently, Redco implementations exhibit much
fewer code lines compared to their official counterparts.Comment: Released under Apache License 2.0 at
https://github.com/tanyuqian/redc
Flavour by design: food-grade lactic acid bacteria improve the volatile aroma spectrum of oat milk, sunflower seed milk, pea milk, and faba milk towards improved flavour and sensory perception
Background The global market of plant-based milk alternatives is continually growing. Flavour and taste have a key
impact on consumersâ selection of plant-based beverages. Unfortunately, natural plant milks have only limited acceptance. Their typically bean-like and grassy notes are perceived asâof-favoursâ by consumers, while preferred fruity,
buttery, and cheesy notes are missing. In this regard, fermentation of plant milk by lactic acid bacteria (LAB) appears
to be an appealing option to improve aroma and taste.
Results In this work, we systematically studied LAB fermentation of plant milk. For this purpose, we evaluated 15
food-approved LAB strains to ferment 4 diferent plant milks: oat milk (representing cereal-based milk), sunfower seed
milk (representing seed-based milk), and pea and faba milk (representing legume-based milk). Using GCâMS analysis,
favour changes during anaerobic fermentations were studied in detail. These revealed species-related and plant milkrelated diferences and highlighted several well-performing strains delivered a range of benefcial favour changes.
A developed data model estimated the impact of individual favour compounds using sensory scores and predicted
the overall favour note of fermented and nonfermented samples. Selected sensory perception tests validated
the model and allowed us to bridge compositional changes in the favour profle with consumer response.
Conclusion Specifc strain-milk combinations provided quite diferent favour notes. This opens further developments towards plant-based products with improved favour, including cheesy and buttery notes, as well as other
innovative products in the future. S. thermophilus emerged as a well-performing strain that delivered preferred buttery
notes in all tested plant milks. The GCâMS-based data model was found to be helpful in predicting sensory perception, and its further refnement and application promise enhanced potential to upgrade fermentation approaches
to favour-by-design strategies
Equivariant Similarity for Vision-Language Foundation Models
This study explores the concept of equivariance in vision-language foundation
models (VLMs), focusing specifically on the multimodal similarity function that
is not only the major training objective but also the core delivery to support
downstream tasks. Unlike the existing image-text similarity objective which
only categorizes matched pairs as similar and unmatched pairs as dissimilar,
equivariance also requires similarity to vary faithfully according to the
semantic changes. This allows VLMs to generalize better to nuanced and unseen
multimodal compositions. However, modeling equivariance is challenging as the
ground truth of semantic change is difficult to collect. For example, given an
image-text pair about a dog, it is unclear to what extent the similarity
changes when the pixel is changed from dog to cat? To this end, we propose
EqSim, a regularization loss that can be efficiently calculated from any two
matched training pairs and easily pluggable into existing image-text retrieval
fine-tuning. Meanwhile, to further diagnose the equivariance of VLMs, we
present a new challenging benchmark EqBen. Compared to the existing evaluation
sets, EqBen is the first to focus on "visual-minimal change". Extensive
experiments show the lack of equivariance in current VLMs and validate the
effectiveness of EqSim. Code is available at https://github.com/Wangt-CN/EqBen.Comment: Accepted by ICCV'23 (Oral); Add evaluation on MLL
Recommended from our members
Genome-wide comparison of DNA hydroxymethylation in mouse embryonic stem cells and neural progenitor cells by a new comparative hMeDIP-seq method
The genome-wide distribution patterns of the â6th baseâ 5-hydroxymethylcytosine (5hmC) in many tissues and cells have recently been revealed by hydroxymethylated DNA immunoprecipitation (hMeDIP) followed by high throughput sequencing or tiling arrays. However, it has been challenging to directly compare different data sets and samples using data generated by this method. Here, we report a new comparative hMeDIP-seq method, which involves barcoding different input DNA samples at the start and then performing hMeDIP-seq for multiple samples in one hMeDIP reaction. This approach extends the barcode technology from simply multiplexing the DNA deep sequencing outcome and provides significant advantages for quantitative control of all experimental steps, from unbiased hMeDIP to deep sequencing data analysis. Using this improved method, we profiled and compared the DNA hydroxymethylomes of mouse ES cells (ESCs) and mouse ESC-derived neural progenitor cells (NPCs). We identified differentially hydroxymethylated regions (DHMRs) between ESCs and NPCs and uncovered an intricate relationship between the alteration of DNA hydroxymethylation and changes in gene expression during neural lineage commitment of ESCs. Presumably, the DHMRs between ESCs and NPCs uncovered by this approach may provide new insight into the function of 5hmC in gene regulation and neural differentiation. Thus, this newly developed comparative hMeDIP-seq method provides a cost-effective and user-friendly strategy for direct genome-wide comparison of DNA hydroxymethylation across multiple samples, lending significant biological, physiological and clinical implications
- âŚ