233 research outputs found
Studying microbial community diversities by developing high-throughput experimental techniques and computational tools
Since the advent of high-throughput technologies, the understanding of microbial biodiversity has rapidly transformed. Amplicon sequencing of phylogenetic makers, especially 16S rRNA genes has now become a well-adopted tool to discover microbial taxonomic diversities in virtually all habitats, aquatic, terrestrial, local or global ecosystems. Although high-throughput sequencing, such as Illumina-based technologies (e.g. MiSeq), has revolutionized microbial ecology, the adoption of amplicon sequencing for environmental microbial community analysis is challenging due to the problem of low base diversity of the target region. In this study, a new phasing amplicon sequencing approach (PAS) was developed by shifting sequencing phases among different community samples from both directions via adding various numbers of bases (0–7) as spacers to both forward and reverse primers. Our results first indicated that the PAS method substantially ameliorated the problem of unbalanced base composition. Second, the PAS method substantially improved the sequence read base quality (an average of 10 % higher of bases above Q30). Third, the PAS method effectively increased raw sequence throughput (~15 % more raw reads). In addition, the PAS method significantly increased effective reads (9–47 %) and the effective read sequence length (16–96 more bases) after quality trim at Q30 with window 5. In addition, the PAS method reduced half of the sequencing errors (0.54–1.1 % less). Finally, two-step PCR amplification of the PAS method effectively ameliorated the amplification biases introduced by the long-barcoded PCR primers. The developed strategy is robust for 16S rRNA gene amplicon sequencing, and a similar strategy could also be used for sequencing other genes important to ecosystem functional processes.
To facilitate the analysis of the data produced from the amplicon sequencing technologies, a data analysis pipeline is developed and is running to serve more than 200 users with the data processing and preliminary analysis for the amplicon sequences. The publicly available pipelines, such as QIIME(Caporaso, Kuczynski et al. 2010, Caporaso, Lauber et al. 2012) and MOTHUR (Schloss, Westcott et al. 2009), are mostly standalone services and need minimum program skills to perform the analysis. Our pipeline provides a more user-friendly interface through webpage and users will only need to click buttons rather than type command lines to perform the basic data analysis. Besides the convenient operations, the Galaxy platform provides an organized way to upload, store, track and share the data histories from different projects. The pipeline is also flexible to add new programs that are developed by others and the data source is not limited to 16S rRNAs but also functional gene amplicon sequences. The pipeline has served the research community for several years, and more than a dozen papers are published using this pipeline.
A practical application of amplicon sequencing was followed to discover the biodiversity of microbial fungal communities in six North American forests soils. The biodiversity of fungi has been studied across many habitats, but the spatial patterns of fungi diversity and the possible mechanisms behind them still need exploration. In this study, the soil fungal samples were collected from six forest sites across a wide range of latitudes in North America with a nested design in each site to uncover the diversity pattern of the soil fungal communities in forest systems. The richness of fungi follows a clear latitudinal gradient, where temperature, precipitation, pH and nitrogen concentration also contribute to the prediction of the richness of the soil fungal communities. The compositions of fungal communities are distinct from each other across six forest sites. The main drivers of alpha diversity of fungi in forest soil are latitude, along with the mean annual temperature, precipitation, soil pH, soil total carbon, and soil total nitrogen. These seven variables can be used to predict the α-diversity of the soil fungal communities, and more than 70% variance can be explained by these variables only. As for the β-diversity, the dissimilarities among the fungal communities increases significantly as the distance between the sampling sites become larger. The distance-decay curve explains this pattern and indicates that the turnover rates of the fungal species are different in the local and continental scales. We further proved that the key drivers of the difference in fungal community composition highly depends on the spatial scale, and the geographic distance is the major contributor to explain these differences. In summary, this study of the fungal communities in the North American forest soils has shown several patterns along with the possible drivers behind them, which presents insights into the nature of soil fungal communities.
When the advanced high-throughput technologies have enabled researchers to gain unprecedented insights of the diversity of microbial communities without culturing and identify individuals, the merely knowing the answer to “who is there” is no longer enough, the question now is to link the ‘measurable’ community structures to the ecosystem functioning. If this connection can be set up, then it is possible to understand that how the disturbances brought by the human activities and global climate change will change the ecosystem functioning carried out by microbial communities. Functional diversity, which measures the range of things that organisms do in the surrounding ecosystem has shown its power in linking the microbial communities to the dynamics of ecosystems. In the final part of this study, we provide a framework using Rao’s entropy to quantify microbial functional diversity based on GeoChip (a high-throughput functional gene array), and the phylogenetic distances between each probe are considered in the calculation. This index falls into the category of trait-based functional diversity, with the advantages of pre-selected key functional traits related to functional ecosystem designed in GeoChip. This functional diversity index can be partitioned into α- and β- diversity, which extends the understanding of functional diversity pattern into different temporal or spatial scales. The functional redundancy can also be defined following the definition of the functional diversity, which is more like a measure of gene similarity or convergence, rather than the traditionally defined ‘functional redundancy’ for multiple functionalities in an ecosystem. Given the hypothesis that sequence similarity leads to function similarity, the new definition of functional redundancy can reveal the redundant level of functional traits in the same gene. We applied this functional diversity framework to study the dynamic changes over a 9-month period of microbial communities in a contaminated groundwater system (with U(VI), SO42-, NO3-, etc.,) after a one-time EVO (emulsified vegetable oil) amendment, which has been proven that it can effectively reduce U(VI) for a considerable time period (around one year). Using the acetate production as the measurement of EVO degradation process, the functional diversity of the key gene responsible for degradation of EVO significantly correlate with the function itself (R2 = 0.685, p-0.021), where the other functional indices such as the gene richness did not show such a strong relationship. When using functional diversity to profile the whole community functional structure, statistical tests also proved that the change of environmental variables does shift the community functional structure, while this connection is not as clear if using other indices to represent the community functional structures. In summary, the new framework of function diversity integrates both functional traits and their phylogenetic signals, which has been proven to be a more sensitive indicator of ecological functions than traditionally used gene richness
Research on Online Reviews Reliability
This study examines the factors that have an impact on online reviews reliability. A theoretical framework was built and empirically tested with a sample of 200 interviewees. Results of structural equation model show that the online reviews quality and perceived risk have positive impact on online review reliability. Accordingly, online review value and number have positive impact on online review quality, customer involvement and reviewer acception have positive impact on perceived risk. The results of this study also suggest that the character of online review and reviewer indirectly impact review reliability by impacting intermediate variables
Transcriptional response of Desulfatibacillum alkenivorans AK-01 to growth on alkanes: insights from RT-qPCR and microarray analyses.
Microbial transformation of n-alkanes in anaerobic ecosystems plays a pivotal role in biogeochemical carbon cycling and bioremediation, but the requisite genetic machinery is not well elucidated.Desulfatibacillum alkenivorans AK-01 utilizes n-alkanes (C13 to C18) and contains two genomic loci encoding alkylsuccinate synthase (ASS) gene clusters. ASS catalyzes alkane addition to fumarate to form methylalkylsuccinic acids. We hypothesized that the genes in the two clusters would be differentially expressed depending on the alkane substrate utilized for growth. RT-qPCR was used to investigate ass-gene expression across AK-01's known substrate range, and microarray-based transcriptomic analysis served to investigate whole-cell responses to growth on n-hexadecane versus hexadecanoate. RT-qPCR revealed induction of ass gene cluster 1 during growth on all tested alkane substrates, and the transcriptional start sites in cluster 1 were determined via 5'RACE. Induction of ass gene cluster 2 was not observed under the tested conditions. Transcriptomic analysis indicated that the upregulation of genes potentially involved in methylalkylsuccinate metabolism, including methylmalonyl-CoA mutase and a putative carboxyl transferase. These findings provide new directions for studying the transcriptional regulation of genes involved in alkane addition to fumarate, fumarate recycling and the processing of methylalkylsuccinates with regard to isolates, enrichment cultures and ecological datasets
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
Large Language Models (LLMs) have made significant progress in utilizing
tools, but their ability is limited by API availability and the instability of
implicit reasoning, particularly when both planning and execution are involved.
To overcome these limitations, we propose CREATOR, a novel framework that
enables LLMs to create their own tools using documentation and code
realization. CREATOR disentangles abstract tool creation and concrete decision
execution, resulting in improved performance. We evaluate CREATOR on MATH and
TabMWP benchmarks, respectively consisting of challenging math competition
problems and diverse tabular contents. Remarkably, CREATOR outperforms existing
chain-of-thought, program-of-thought, and tool-using baselines. Additionally,
we introduce the Creation Challenge dataset, featuring 2K diverse questions, to
emphasize the necessity and benefits of LLMs' tool creation ability. Further
research demonstrates that leveraging LLMs as tool creators facilitates
knowledge transfer, and LLMs exhibit varying levels of tool creation abilities,
enabling them to adapt to diverse situations. The tool creation ability
revolutionizes the LLM's problem-solving paradigm, driving us closer to the
next frontier of artificial intelligence. All the codes and data are released
Large Language Model-based Human-Agent Collaboration for Complex Task Solving
In recent developments within the research community, the integration of
Large Language Models (LLMs) in creating fully autonomous agents has garnered
significant interest. Despite this, LLM-based agents frequently demonstrate
notable shortcomings in adjusting to dynamic environments and fully grasping
human needs. In this work, we introduce the problem of LLM-based human-agent
collaboration for complex task-solving, exploring their synergistic potential.
In addition, we propose a Reinforcement Learning-based Human-Agent
Collaboration method, ReHAC. This approach includes a policy model designed to
determine the most opportune stages for human intervention within the
task-solving process. We construct a human-agent collaboration dataset to train
this policy model in an offline reinforcement learning environment. Our
validation tests confirm the model's effectiveness. The results demonstrate
that the synergistic efforts of humans and LLM-based agents significantly
improve performance in complex tasks, primarily through well-planned, limited
human intervention. Datasets and code are available at:
https://github.com/XueyangFeng/ReHAC
Similarity and Potential Relation Between Periimplantitis and Rheumatoid Arthritis on Transcriptomic Level: Results of a Bioinformatics Study
Background: This bioinformatics study aimed to reveal potential cross-talk genes,
related pathways, and transcription factors between periimplantitis and rheumatoid
arthritis (RA).
Methods: The datasets GSE33774 (seven periimplantitis and eight control samples) and
GSE106090 (six periimplantitis and six control samples) were included from the National
Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO). A
differential expression analysis (p < 0.05 and |logFC (fold change)| ≥ 1) and a functional
enrichment analysis (p < 0.05) were performed. Based on this, a protein–protein
interaction (PPI) network was constructed by Cytoscape. RA-related genes were
extracted from DisGeNET database, and an overlap between periimplantitis-related
genes and these RA-related genes was examined to identify potential cross-talk genes.
Gene expression was merged between two datasets, and feature selection was
performed by Recursive Feature Elimination (RFE) algorithm. For the feature selection
cross-talk genes, support vector machine (SVM) models were constructed. The
expression of these feature genes was determined from GSE93272 for RA. Finally, a
network including cross-talk genes, related pathways, and transcription factors
was constructed.
Results: Periimplantitis datasets included 138 common differentially expressed genes
(DEGs) including 101 up- and 37 downregulated DEGs. The PPI interwork of
periimplantitis comprised 1,818 nodes and 2,517 edges. The RFE method selected six
features, i.e., MERTK, CD14, MAPT, CCR1, C3AR1, and FCGR2B, which had the highest
prediction. Out of these feature genes, CD14 and FCGR2B were most highly expressed in
periimplantitis and RA. The final activated pathway–gene network contained 181 nodes
and 360 edges. Nuclear factor (NF) kappa B signaling pathway and osteoclast
differentiation were identified as potentially relevant pathways.
Conclusions: This current study revealed FCGR2B and CD14 as the most relevant
potential cross-talk genes between RA and periimplantitis, which suggests a similarity
between RA and periimplantitis and can serve as a theoretical basis for future research
Exploring Format Consistency for Instruction Tuning
Instruction tuning has emerged as a promising approach to enhancing large
language models in following human instructions. It is shown that increasing
the diversity and number of instructions in the training data can consistently
enhance generalization performance, which facilitates a recent endeavor to
collect various instructions and integrate existing instruction tuning datasets
into larger collections. However, different users have their unique ways of
expressing instructions, and there often exist variations across different
datasets in the instruction styles and formats, i.e., format inconsistency. In
this work, we study how format inconsistency may impact the performance of
instruction tuning. We propose a framework called "Unified Instruction Tuning"
(UIT), which calls OpenAI APIs for automatic format transfer among different
instruction tuning datasets. We show that UIT successfully improves the
generalization performance on unseen instructions, which highlights the
importance of format consistency for instruction tuning. To make the UIT
framework more practical, we further propose a novel perplexity-based denoising
method to reduce the noise of automatic format transfer. We also train a
smaller offline model that achieves comparable format transfer capability than
OpenAI APIs to reduce costs in practice
ProQA: Structural Prompt-based Pre-training for Unified Question Answering
Question Answering (QA) is a longstanding challenge in natural language
processing. Existing QA works mostly focus on specific question types,
knowledge domains, or reasoning skills. The specialty in QA research hinders
systems from modeling commonalities between tasks and generalization for wider
applications. To address this issue, we present ProQA, a unified QA paradigm
that solves various tasks through a single model. ProQA takes a unified
structural prompt as the bridge and improves the QA-centric ability by
structural prompt-based pre-training. Through a structurally designed
prompt-based input schema, ProQA concurrently models the knowledge
generalization for all QA tasks while keeping the knowledge customization for
every specific QA task. Furthermore, ProQA is pre-trained with structural
prompt-formatted large-scale synthesized corpus, which empowers the model with
the commonly-required QA ability. Experimental results on 11 QA benchmarks
demonstrate that ProQA consistently boosts performance on both full data
fine-tuning, few-shot learning, and zero-shot testing scenarios. Furthermore,
ProQA exhibits strong ability in both continual learning and transfer learning
by taking the advantages of the structural prompt.Comment: NAACL 202
- …