Search CORE

281 research outputs found

Serum lactate dehydrogenase activities as systems biomarkers for 48 types of human diseases

Author: An Yi
Guo Yachong
Lu Caixia
Pan Nana
Tan Lijuan
Wu Yuling
Xu Mengyuan
Zhang Lijuan
Zhang Meng
Publication venue: [London] : Macmillan Publishers Limited, part of Springer Nature
Publication date: 01/01/2021
Field of study

Most human diseases are systems diseases, and systems biomarkers are better fitted for diagnostic, prognostic, and treatment monitoring purposes. To search for systems biomarker candidates, lactate dehydrogenase (LDH), a housekeeping protein expressed in all living cells, was investigated. To this end, we analyzed the serum LDH activities from 172,933 patients with 48 clinically defined diseases and 9528 healthy individuals. Based on the median values, we found that 46 out of 48 diseases, leading by acute myocardial infarction, had significantly increased (p 0.8) for hepatic encephalopathy and lung fibrosis

Directory of Open Access Journals

PubMed Central

Repositorium für Naturwissenschaften und Technik (TIB Hannover)

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

Author: Chen Jindong
Hu Zhiting
Liu Lijuan
Tan Bowen
Xing Eric
Zhu Yun
Publication venue
Publication date: 11/11/2023
Field of study

Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.Comment: In proceedings of NeurIPS 2023; Code and model available at https://github.com/tanyuqian/cappy and https://huggingface.co/btan2/cappy-large, respectivel

arXiv.org e-Print Archive

Edge-aware Hard Clustering Graph Pooling for Brain Imaging Data

Author: Chen Honghan
Liang Ping
Tan Ying
Wu Xi
Yang Shuqi
Zhang Lijuan
Zhu Cheng
Zhu Jiayi
Publication venue
Publication date: 03/09/2023
Field of study

Graph Convolutional Networks (GCNs) can capture non-Euclidean spatial dependence between different brain regions, and the graph pooling operator in GCNs is key to enhancing the representation learning capability and acquiring abnormal brain maps. However, the majority of existing research designs graph pooling operators only from the perspective of nodes while disregarding the original edge features, in a way that not only confines graph pooling application scenarios, but also diminishes its ability to capture critical substructures. In this study, a clustering graph pooling method that first supports multidimensional edge features, called Edge-aware hard clustering graph pooling (EHCPool), is developed. EHCPool proposes the first 'Edge-to-node' score evaluation criterion based on edge features to assess node feature significance. To more effectively capture the critical subgraphs, a novel Iteration n-top strategy is further designed to adaptively learn sparse hard clustering assignments for graphs. Subsequently, an innovative N-E Aggregation strategy is presented to aggregate node and edge feature information in each independent subgraph. The proposed model was evaluated on multi-site brain imaging public datasets and yielded state-of-the-art performance. We believe this method is the first deep learning tool with the potential to probe different types of abnormal functional brain networks from data-driven perspective. Core code is at: https://github.com/swfen/EHCPool

arXiv.org e-Print Archive

Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs

Author: Chen Jindong
Hu Zhiting
Liu Lijuan
Tan Bowen
Wang Hongyi
Xing Eric
Zhu Yun
Zhuang Yonghao
Publication venue
Publication date: 25/10/2023
Field of study

The recent progress of AI can be largely attributed to large language models (LLMs). However, their escalating memory requirements introduce challenges for machine learning (ML) researchers and engineers. Addressing this requires developers to partition a large model to distribute it across multiple GPUs or TPUs. This necessitates considerable coding and intricate configuration efforts with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa. These tools require users' expertise in machine learning systems (MLSys), creating a bottleneck in LLM development, particularly for developers without MLSys background. In this work, we present Redco, a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development. The design of Redco emphasizes two key aspects. Firstly, to automate model parallism, our study identifies two straightforward rules to generate tensor parallel strategies for any given LLM. Integrating these rules into Redco facilitates effortless distributed LLM training and inference, eliminating the need of additional coding or complex configurations. We demonstrate the effectiveness by applying Redco on a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to the size of 66B. Secondly, we propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions, eliminating redundant and formulaic code like multi-host related processing. This mechanism proves adaptable across a spectrum of ML algorithms, from foundational language modeling to complex algorithms like meta-learning and reinforcement learning. Consequently, Redco implementations exhibit much fewer code lines compared to their official counterparts.Comment: Released under Apache License 2.0 at https://github.com/tanyuqian/redc

arXiv.org e-Print Archive

Flavour by design: food-grade lactic acid bacteria improve the volatile aroma spectrum of oat milk, sunflower seed milk, pea milk, and faba milk towards improved flavour and sensory perception

Author: Bogicevic Biljana
Bolten Christoph J.
Fritz Michel
Tan Jan Patrick
Tangyu Muzi
Wittmann Christoph
Ye Lijuan
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2023
Field of study

Background The global market of plant-based milk alternatives is continually growing. Flavour and taste have a key impact on consumers’ selection of plant-based beverages. Unfortunately, natural plant milks have only limited acceptance. Their typically bean-like and grassy notes are perceived as“of-favours” by consumers, while preferred fruity, buttery, and cheesy notes are missing. In this regard, fermentation of plant milk by lactic acid bacteria (LAB) appears to be an appealing option to improve aroma and taste. Results In this work, we systematically studied LAB fermentation of plant milk. For this purpose, we evaluated 15 food-approved LAB strains to ferment 4 diferent plant milks: oat milk (representing cereal-based milk), sunfower seed milk (representing seed-based milk), and pea and faba milk (representing legume-based milk). Using GC‒MS analysis, favour changes during anaerobic fermentations were studied in detail. These revealed species-related and plant milkrelated diferences and highlighted several well-performing strains delivered a range of benefcial favour changes. A developed data model estimated the impact of individual favour compounds using sensory scores and predicted the overall favour note of fermented and nonfermented samples. Selected sensory perception tests validated the model and allowed us to bridge compositional changes in the favour profle with consumer response. Conclusion Specifc strain-milk combinations provided quite diferent favour notes. This opens further developments towards plant-based products with improved favour, including cheesy and buttery notes, as well as other innovative products in the future. S. thermophilus emerged as a well-performing strain that delivered preferred buttery notes in all tested plant milks. The GC‒MS-based data model was found to be helpful in predicting sensory perception, and its further refnement and application promise enhanced potential to upgrade fermentation approaches to favour-by-design strategies

Universaar

Scientific publications of the Saarland University

Equivariant Similarity for Vision-Language Foundation Models

Author: Li Linjie
Lin Chung-Ching
Lin Kevin
Liu Zicheng
Wang Lijuan
Wang Tan
Yang Zhengyuan
Zhang Hanwang
Publication venue
Publication date: 09/10/2023
Field of study

This study explores the concept of equivariance in vision-language foundation models (VLMs), focusing specifically on the multimodal similarity function that is not only the major training objective but also the core delivery to support downstream tasks. Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes. This allows VLMs to generalize better to nuanced and unseen multimodal compositions. However, modeling equivariance is challenging as the ground truth of semantic change is difficult to collect. For example, given an image-text pair about a dog, it is unclear to what extent the similarity changes when the pixel is changed from dog to cat? To this end, we propose EqSim, a regularization loss that can be efficiently calculated from any two matched training pairs and easily pluggable into existing image-text retrieval fine-tuning. Meanwhile, to further diagnose the equivariance of VLMs, we present a new challenging benchmark EqBen. Compared to the existing evaluation sets, EqBen is the first to focus on "visual-minimal change". Extensive experiments show the lack of equivariance in current VLMs and validate the effectiveness of EqSim. Code is available at https://github.com/Wangt-CN/EqBen.Comment: Accepted by ICCV'23 (Oral); Add evaluation on MLL

arXiv.org e-Print Archive

Gene Expression Browser: large-scale and cross-experiment microarray data integration, management, search & visualization

Author: An Yongqiang
Fan Xingming
Kang Manjit S
Liu Li
Tan Jing
Tsang Shirley
Yao Wenhua
Yu Lijuan
Zhang Ming
Zhang Yudong
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

Genome-wide comparison of DNA hydroxymethylation in mouse embryonic stem cells and neural progenitor cells by a new comparative hMeDIP-seq method

Author: Huang Ning
Kong Lingchun
Schwartz Lisa
Shi Yang
Shi Yujiang
Tan Li
Wu Feizhen
Xiong Lijun
Xu Wenqi
Xu Yufei
Zheng Lijuan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 17/10/2013
Field of study

The genome-wide distribution patterns of the ‘6th base’ 5-hydroxymethylcytosine (5hmC) in many tissues and cells have recently been revealed by hydroxymethylated DNA immunoprecipitation (hMeDIP) followed by high throughput sequencing or tiling arrays. However, it has been challenging to directly compare different data sets and samples using data generated by this method. Here, we report a new comparative hMeDIP-seq method, which involves barcoding different input DNA samples at the start and then performing hMeDIP-seq for multiple samples in one hMeDIP reaction. This approach extends the barcode technology from simply multiplexing the DNA deep sequencing outcome and provides significant advantages for quantitative control of all experimental steps, from unbiased hMeDIP to deep sequencing data analysis. Using this improved method, we profiled and compared the DNA hydroxymethylomes of mouse ES cells (ESCs) and mouse ESC-derived neural progenitor cells (NPCs). We identified differentially hydroxymethylated regions (DHMRs) between ESCs and NPCs and uncovered an intricate relationship between the alteration of DNA hydroxymethylation and changes in gene expression during neural lineage commitment of ESCs. Presumably, the DHMRs between ESCs and NPCs uncovered by this approach may provide new insight into the function of 5hmC in gene regulation and neural differentiation. Thus, this newly developed comparative hMeDIP-seq method provides a cost-effective and user-friendly strategy for direct genome-wide comparison of DNA hydroxymethylation across multiple samples, lending significant biological, physiological and clinical implications

Harvard University - DASH