Search CORE

233 research outputs found

Studying microbial community diversities by developing high-throughput experimental techniques and computational tools

Author: Qin Yujia
Publication venue
Publication date: 10/12/2018
Field of study

Since the advent of high-throughput technologies, the understanding of microbial biodiversity has rapidly transformed. Amplicon sequencing of phylogenetic makers, especially 16S rRNA genes has now become a well-adopted tool to discover microbial taxonomic diversities in virtually all habitats, aquatic, terrestrial, local or global ecosystems. Although high-throughput sequencing, such as Illumina-based technologies (e.g. MiSeq), has revolutionized microbial ecology, the adoption of amplicon sequencing for environmental microbial community analysis is challenging due to the problem of low base diversity of the target region. In this study, a new phasing amplicon sequencing approach (PAS) was developed by shifting sequencing phases among different community samples from both directions via adding various numbers of bases (0–7) as spacers to both forward and reverse primers. Our results first indicated that the PAS method substantially ameliorated the problem of unbalanced base composition. Second, the PAS method substantially improved the sequence read base quality (an average of 10 % higher of bases above Q30). Third, the PAS method effectively increased raw sequence throughput (~15 % more raw reads). In addition, the PAS method significantly increased effective reads (9–47 %) and the effective read sequence length (16–96 more bases) after quality trim at Q30 with window 5. In addition, the PAS method reduced half of the sequencing errors (0.54–1.1 % less). Finally, two-step PCR amplification of the PAS method effectively ameliorated the amplification biases introduced by the long-barcoded PCR primers. The developed strategy is robust for 16S rRNA gene amplicon sequencing, and a similar strategy could also be used for sequencing other genes important to ecosystem functional processes. To facilitate the analysis of the data produced from the amplicon sequencing technologies, a data analysis pipeline is developed and is running to serve more than 200 users with the data processing and preliminary analysis for the amplicon sequences. The publicly available pipelines, such as QIIME(Caporaso, Kuczynski et al. 2010, Caporaso, Lauber et al. 2012) and MOTHUR (Schloss, Westcott et al. 2009), are mostly standalone services and need minimum program skills to perform the analysis. Our pipeline provides a more user-friendly interface through webpage and users will only need to click buttons rather than type command lines to perform the basic data analysis. Besides the convenient operations, the Galaxy platform provides an organized way to upload, store, track and share the data histories from different projects. The pipeline is also flexible to add new programs that are developed by others and the data source is not limited to 16S rRNAs but also functional gene amplicon sequences. The pipeline has served the research community for several years, and more than a dozen papers are published using this pipeline. A practical application of amplicon sequencing was followed to discover the biodiversity of microbial fungal communities in six North American forests soils. The biodiversity of fungi has been studied across many habitats, but the spatial patterns of fungi diversity and the possible mechanisms behind them still need exploration. In this study, the soil fungal samples were collected from six forest sites across a wide range of latitudes in North America with a nested design in each site to uncover the diversity pattern of the soil fungal communities in forest systems. The richness of fungi follows a clear latitudinal gradient, where temperature, precipitation, pH and nitrogen concentration also contribute to the prediction of the richness of the soil fungal communities. The compositions of fungal communities are distinct from each other across six forest sites. The main drivers of alpha diversity of fungi in forest soil are latitude, along with the mean annual temperature, precipitation, soil pH, soil total carbon, and soil total nitrogen. These seven variables can be used to predict the α-diversity of the soil fungal communities, and more than 70% variance can be explained by these variables only. As for the β-diversity, the dissimilarities among the fungal communities increases significantly as the distance between the sampling sites become larger. The distance-decay curve explains this pattern and indicates that the turnover rates of the fungal species are different in the local and continental scales. We further proved that the key drivers of the difference in fungal community composition highly depends on the spatial scale, and the geographic distance is the major contributor to explain these differences. In summary, this study of the fungal communities in the North American forest soils has shown several patterns along with the possible drivers behind them, which presents insights into the nature of soil fungal communities. When the advanced high-throughput technologies have enabled researchers to gain unprecedented insights of the diversity of microbial communities without culturing and identify individuals, the merely knowing the answer to “who is there” is no longer enough, the question now is to link the ‘measurable’ community structures to the ecosystem functioning. If this connection can be set up, then it is possible to understand that how the disturbances brought by the human activities and global climate change will change the ecosystem functioning carried out by microbial communities. Functional diversity, which measures the range of things that organisms do in the surrounding ecosystem has shown its power in linking the microbial communities to the dynamics of ecosystems. In the final part of this study, we provide a framework using Rao’s entropy to quantify microbial functional diversity based on GeoChip (a high-throughput functional gene array), and the phylogenetic distances between each probe are considered in the calculation. This index falls into the category of trait-based functional diversity, with the advantages of pre-selected key functional traits related to functional ecosystem designed in GeoChip. This functional diversity index can be partitioned into α- and β- diversity, which extends the understanding of functional diversity pattern into different temporal or spatial scales. The functional redundancy can also be defined following the definition of the functional diversity, which is more like a measure of gene similarity or convergence, rather than the traditionally defined ‘functional redundancy’ for multiple functionalities in an ecosystem. Given the hypothesis that sequence similarity leads to function similarity, the new definition of functional redundancy can reveal the redundant level of functional traits in the same gene. We applied this functional diversity framework to study the dynamic changes over a 9-month period of microbial communities in a contaminated groundwater system (with U(VI), SO42-, NO3-, etc.,) after a one-time EVO (emulsified vegetable oil) amendment, which has been proven that it can effectively reduce U(VI) for a considerable time period (around one year). Using the acetate production as the measurement of EVO degradation process, the functional diversity of the key gene responsible for degradation of EVO significantly correlate with the function itself (R2 = 0.685, p-0.021), where the other functional indices such as the gene richness did not show such a strong relationship. When using functional diversity to profile the whole community functional structure, statistical tests also proved that the change of environmental variables does shift the community functional structure, while this connection is not as clear if using other indices to represent the community functional structures. In summary, the new framework of function diversity integrates both functional traits and their phylogenetic signals, which has been proven to be a more sensitive indicator of ecological functions than traditionally used gene richness

SHAREOK repository

Research on Online Reviews Reliability

Author: Guan Siqi
Guo Yujia
Qi Wei
Qin Juan
Wang Chunyu
Publication venue: AIS Electronic Library (AISeL)
Publication date: 26/05/2017
Field of study

This study examines the factors that have an impact on online reviews reliability. A theoretical framework was built and empirically tested with a sample of 200 interviewees. Results of structural equation model show that the online reviews quality and perceived risk have positive impact on online review reliability. Accordingly, online review value and number have positive impact on online review quality, customer involvement and reviewer acception have positive impact on perceived risk. The results of this study also suggest that the character of online review and reviewer indirectly impact review reliability by impacting intermediate variables

AIS Electronic Library (AISeL)

Transcriptional response of Desulfatibacillum alkenivorans AK-01 to growth on alkanes: insights from RT-qPCR and microarray analyses.

Author: Callaghan Amy V
Herath Anjumala
Qin Yujia
Wawrik Boris
Zhou Jizhong
Publication venue: eScholarship, University of California
Publication date: 23/03/2016
Field of study

Microbial transformation of n-alkanes in anaerobic ecosystems plays a pivotal role in biogeochemical carbon cycling and bioremediation, but the requisite genetic machinery is not well elucidated.Desulfatibacillum alkenivorans AK-01 utilizes n-alkanes (C13 to C18) and contains two genomic loci encoding alkylsuccinate synthase (ASS) gene clusters. ASS catalyzes alkane addition to fumarate to form methylalkylsuccinic acids. We hypothesized that the genes in the two clusters would be differentially expressed depending on the alkane substrate utilized for growth. RT-qPCR was used to investigate ass-gene expression across AK-01's known substrate range, and microarray-based transcriptomic analysis served to investigate whole-cell responses to growth on n-hexadecane versus hexadecanoate. RT-qPCR revealed induction of ass gene cluster 1 during growth on all tested alkane substrates, and the transcriptional start sites in cluster 1 were determined via 5'RACE. Induction of ass gene cluster 2 was not observed under the tested conditions. Transcriptomic analysis indicated that the upregulation of genes potentially involved in methylalkylsuccinate metabolism, including methylmalonyl-CoA mutase and a putative carboxyl transferase. These findings provide new directions for studying the transcriptional regulation of genes involved in alkane addition to fumarate, fumarate recycling and the processing of methylalkylsuccinates with regard to isolates, enrichment cultures and ecological datasets

Crossref

eScholarship - University of California

CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models

Author: Fung Yi R.
Han Chi
Ji Heng
Liu Zhiyuan
Qian Cheng
Qin Yujia
Publication venue
Publication date: 08/10/2023
Field of study

Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disentangles abstract tool creation and concrete decision execution, resulting in improved performance. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems and diverse tabular contents. Remarkably, CREATOR outperforms existing chain-of-thought, program-of-thought, and tool-using baselines. Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability. Further research demonstrates that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to adapt to diverse situations. The tool creation ability revolutionizes the LLM's problem-solving paradigm, driving us closer to the next frontier of artificial intelligence. All the codes and data are released

arXiv.org e-Print Archive

Large Language Model-based Human-Agent Collaboration for Complex Task Solving

Author: Chen Xu
Chen Zhi-Yuan
Feng Xueyang
Lin Yankai
Liu Zhiyuan
Qin Yujia
Wen Ji-Rong
Publication venue
Publication date: 20/02/2024
Field of study

In recent developments within the research community, the integration of Large Language Models (LLMs) in creating fully autonomous agents has garnered significant interest. Despite this, LLM-based agents frequently demonstrate notable shortcomings in adjusting to dynamic environments and fully grasping human needs. In this work, we introduce the problem of LLM-based human-agent collaboration for complex task-solving, exploring their synergistic potential. In addition, we propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process. We construct a human-agent collaboration dataset to train this policy model in an offline reinforcement learning environment. Our validation tests confirm the model's effectiveness. The results demonstrate that the synergistic efforts of humans and LLM-based agents significantly improve performance in complex tasks, primarily through well-planned, limited human intervention. Datasets and code are available at: https://github.com/XueyangFeng/ReHAC

arXiv.org e-Print Archive

Similarity and Potential Relation Between Periimplantitis and Rheumatoid Arthritis on Transcriptomic Level: Results of a Bioinformatics Study

Author: Li Lijiao
Li Shiyi
Pelekos George
Qin Zeman
Schmalz Gerhard
Wang Yujia
Xu Yongqian
Zhou Changqing
Ziebolz Dirk
Publication venue: Frontiers Research Foundation
Publication date: 24/03/2023
Field of study

Background: This bioinformatics study aimed to reveal potential cross-talk genes, related pathways, and transcription factors between periimplantitis and rheumatoid arthritis (RA). Methods: The datasets GSE33774 (seven periimplantitis and eight control samples) and GSE106090 (six periimplantitis and six control samples) were included from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO). A differential expression analysis (p < 0.05 and |logFC (fold change)| ≥ 1) and a functional enrichment analysis (p < 0.05) were performed. Based on this, a protein–protein interaction (PPI) network was constructed by Cytoscape. RA-related genes were extracted from DisGeNET database, and an overlap between periimplantitis-related genes and these RA-related genes was examined to identify potential cross-talk genes. Gene expression was merged between two datasets, and feature selection was performed by Recursive Feature Elimination (RFE) algorithm. For the feature selection cross-talk genes, support vector machine (SVM) models were constructed. The expression of these feature genes was determined from GSE93272 for RA. Finally, a network including cross-talk genes, related pathways, and transcription factors was constructed. Results: Periimplantitis datasets included 138 common differentially expressed genes (DEGs) including 101 up- and 37 downregulated DEGs. The PPI interwork of periimplantitis comprised 1,818 nodes and 2,517 edges. The RFE method selected six features, i.e., MERTK, CD14, MAPT, CCR1, C3AR1, and FCGR2B, which had the highest prediction. Out of these feature genes, CD14 and FCGR2B were most highly expressed in periimplantitis and RA. The final activated pathway–gene network contained 181 nodes and 360 edges. Nuclear factor (NF) kappa B signaling pathway and osteoclast differentiation were identified as potentially relevant pathways. Conclusions: This current study revealed FCGR2B and CD14 as the most relevant potential cross-talk genes between RA and periimplantitis, which suggests a similarity between RA and periimplantitis and can serve as a theoretical basis for future research

Qucosa - Publikationsserver der Universität Leipzig

Exploring Format Consistency for Instruction Tuning

Author: Cong Xin
Liang Shihao
Liu Xiaojiang
Liu Zhiyuan
Qin Yujia
Sun Maosong
Tian Runchu
Wang Huadong
Zhu Kunlun
Publication venue
Publication date: 28/07/2023
Field of study

Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger collections. However, different users have their unique ways of expressing instructions, and there often exist variations across different datasets in the instruction styles and formats, i.e., format inconsistency. In this work, we study how format inconsistency may impact the performance of instruction tuning. We propose a framework called "Unified Instruction Tuning" (UIT), which calls OpenAI APIs for automatic format transfer among different instruction tuning datasets. We show that UIT successfully improves the generalization performance on unseen instructions, which highlights the importance of format consistency for instruction tuning. To make the UIT framework more practical, we further propose a novel perplexity-based denoising method to reduce the noise of automatic format transfer. We also train a smaller offline model that achieves comparable format transfer capability than OpenAI APIs to reduce costs in practice

arXiv.org e-Print Archive

ProQA: Structural Prompt-based Pre-training for Unified Question Answering

Author: Ding Ning
Duan Nan
Gao Yifan
Liu Zhiyuan
Qin Yujia
Wang Jiahai
Yin Jian
Zhong Wanjun
Zhou Ming
Publication venue
Publication date: 09/12/2022
Field of study

Question Answering (QA) is a longstanding challenge in natural language processing. Existing QA works mostly focus on specific question types, knowledge domains, or reasoning skills. The specialty in QA research hinders systems from modeling commonalities between tasks and generalization for wider applications. To address this issue, we present ProQA, a unified QA paradigm that solves various tasks through a single model. ProQA takes a unified structural prompt as the bridge and improves the QA-centric ability by structural prompt-based pre-training. Through a structurally designed prompt-based input schema, ProQA concurrently models the knowledge generalization for all QA tasks while keeping the knowledge customization for every specific QA task. Furthermore, ProQA is pre-trained with structural prompt-formatted large-scale synthesized corpus, which empowers the model with the commonly-required QA ability. Experimental results on 11 QA benchmarks demonstrate that ProQA consistently boosts performance on both full data fine-tuning, few-shot learning, and zero-shot testing scenarios. Furthermore, ProQA exhibits strong ability in both continual learning and transfer learning by taking the advantages of the structural prompt.Comment: NAACL 202

arXiv.org e-Print Archive