220 research outputs found
Preparation of biomass derived porous carbon: Application for methane energy storage
The porous carbon with high specific surface area of 3001 m2/g and pore volume of 1.59 cm3/g using corncob was prepared for methane storage. The low and high pressure adsorption equilibria were tested using the prepared carbon. The capacity of methane storage is evaluated using Tóth model with correction of fugacity using Benedict-Webb-Rubin equation of state in high pressure range. The theoretical maximum adsorption amount is achieved to be approximately 3.3 mmol/g. The adsorption process indicates physisorption dominates the adsorption with heterogeneous adsorption characteristic of sites on the surface of adsorbent at low pressure. The resultant carbon is ideal adsorbent for energy storage medium using methane
Wetlands for the Treatment of Agricultural Drainage Water
Agricultural drainage, such as runoffs from farmlands and wineries, are contaminated waters. Their management is monitored by environmental protection authorities who set targets of volume or pollutant reductions. Due to large quantities and seasonal variations, the targets are often not met, and effective management remains a problem in many parts of the world. Natural wetlands are known as the ‘kidneys’ of the earth with unique water purification functions that have long been recognized. Imitating the functions of natural wetlands, constructed wetlands are engineered systems purposely built to treat contaminated waters. They may therefore be called the ‘artificial kidneys’ of the earth. Rural areas often only have low-value lands available for constructed wetlands. Where large quantities of drainage are produced, farmlands are often adjacent to degraded natural wetlands that have reduced ecosystem functions. Controlled discharge and treatment in the wetlands can potentially be part of an integrated solution to multiple environmental problems. This book includes some recent studies on the fate of pollutants removed from agricultural drainage in wetlands, modelling of wetland performance, innovative systems, and the use of non-hazardous agricultural waste in constructed wetlands for wastewater treatment. These studies enhance our understanding of wetland systems, and will help develop wetland technology towards solving the problems associated with agricultural drainage
Immobilization of magnetite nanoparticles for the removal of arsenic and antimony from contaminated water
Magnetite (Fe3 O4) nanoparticles were synthesized and immobilized in a synthetic resin poly-methyl methacrylate (PMMA). The Fe3 O4 nanoparticle-PMMA composites were studied for their efficiencies of removing dissolved arsenic (As) and antimony (Sb). The effects of major environmental and operating parameters on the removal of As and Sb were investigated in batch experiments. Singular and competitive adsorption of As and Sb onto the composites were studied. The results demonstrated the capability of the Fe3 O4-PMMA composites for removing dissolved metalloids
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator
The incorporation of biasing words obtained through contextual knowledge is
of paramount importance in automatic speech recognition (ASR) applications.
This paper proposes an innovative method for achieving end-to-end contextual
ASR using graph neural network (GNN) encodings based on the tree-constrained
pointer generator method. GNN node encodings facilitate lookahead for future
word pieces in the process of ASR decoding at each tree node by incorporating
information about all word pieces on the tree branches rooted from it. This
results in a more precise prediction of the generation probability of the
biasing words. The study explores three GNN encoding techniques, namely tree
recursive neural networks, graph convolutional network (GCN), and GraphSAGE,
along with different combinations of the complementary GCN and GraphSAGE
structures. The performance of the systems was evaluated using the Librispeech
and AMI corpus, following the visual-grounded contextual ASR pipeline. The
findings indicate that using GNN encodings achieved consistent and significant
reductions in word error rate (WER), particularly for words that are rare or
have not been seen during the training process. Notably, the most effective
combination of GNN encodings obtained more than 60% WER reduction for rare and
unseen words compared to standard end-to-end systems.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language
Processin
A review of the enhancement of bio-hydrogen generation by chemicals addition
Bio-hydrogen production (BHP) produced from renewable bio-resources is an attractive route for green energy production, due to its compelling advantages of relative high efficiency, cost-effectiveness, and lower ecological impact. This study reviewed different BHP pathways, and the most important enzymes involved in these pathways, to identify technological gaps and effective approaches for process intensification in industrial applications. Among the various approaches reviewed in this study, a particular focus was set on the latest methods of chemicals/metal addition for improving hydrogen generation during dark fermentation (DF) processes; the up-to-date findings of different chemicals/metal addition methods have been quantitatively evaluated and thoroughly compared in this paper. A new efficiency evaluation criterion is also proposed, allowing different BHP processes to be compared with greater simplicity and validity
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
Wav2Prompt is proposed which allows straightforward integration between
spoken input and a text-based large language model (LLM). Wav2Prompt uses a
simple training process with only the same data used to train an automatic
speech recognition (ASR) model. After training, Wav2Prompt learns continuous
representations from speech and uses them as LLM prompts. To avoid task
over-fitting issues found in prior work and preserve the emergent abilities of
LLMs, Wav2Prompt takes LLM token embeddings as the training targets and
utilises a continuous integrate-and-fire mechanism for explicit speech-text
alignment. Therefore, a Wav2Prompt-LLM combination can be applied to zero-shot
spoken language tasks such as speech translation (ST), speech understanding
(SLU), speech question answering (SQA) and spoken-query-based QA (SQQA). It is
shown that for these zero-shot tasks, Wav2Prompt performs similarly to an
ASR-LLM cascade and better than recent prior work. If relatively small amounts
of task-specific paired data are available in few-shot scenarios, the
Wav2Prompt-LLM combination can be end-to-end (E2E) fine-tuned. The
Wav2Prompt-LLM combination then yields greatly improved results relative to an
ASR-LLM cascade for the above tasks. For instance, for English-French ST with
the BLOOMZ-7B1 LLM, a Wav2Prompt-LLM combination gave a 8.5 BLEU point increase
over an ASR-LLM cascade
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
Foundation models have shown superior performance for speech emotion
recognition (SER). However, given the limited data in emotion corpora,
finetuning all parameters of large pre-trained models for SER can be both
resource-intensive and susceptible to overfitting. This paper investigates
parameter-efficient finetuning (PEFT) for SER. Various PEFT adaptors are
systematically studied for both classification of discrete emotion categories
and prediction of dimensional emotional attributes. The results demonstrate
that the combination of PEFT methods surpasses full finetuning with a
significant reduction in the number of trainable parameters. Furthermore, a
two-stage adaptation strategy is proposed to adapt models trained on acted
emotion data, which is more readily available, to make the model more adept at
capturing natural emotional expressions. Both intra- and cross-corpus
experiments validate the efficacy of the proposed approach in enhancing the
performance on both the source and target domains
Can Contextual Biasing Remain Effective with Whisper and GPT-2?
End-to-end automatic speech recognition (ASR) and large language models, such
as Whisper and GPT-2, have recently been scaled to use vast amounts of training
data. Despite the large amount of training data, infrequent content words that
occur in a particular task may still exhibit poor ASR performance, with
contextual biasing a possible remedy. This paper investigates the effectiveness
of neural contextual biasing for Whisper combined with GPT-2. Specifically,
this paper proposes integrating an adapted tree-constrained pointer generator
(TCPGen) component for Whisper and a dedicated training scheme to dynamically
adjust the final output without modifying any Whisper model parameters.
Experiments across three datasets show a considerable reduction in errors on
biasing words with a biasing list of 1000 words. Contextual biasing was more
effective when applied to domain-specific data and can boost the performance of
Whisper and GPT-2 without losing their generality.Comment: To appear in Interspeech 202
Affect Recognition in Conversations Using Large Language Models
Affect recognition, encompassing emotions, moods, and feelings, plays a
pivotal role in human communication. In the realm of conversational artificial
intelligence (AI), the ability to discern and respond to human affective cues
is a critical factor for creating engaging and empathetic interactions. This
study delves into the capacity of large language models (LLMs) to recognise
human affect in conversations, with a focus on both open-domain chit-chat
dialogues and task-oriented dialogues. Leveraging three diverse datasets,
namely IEMOCAP, EmoWOZ, and DAIC-WOZ, covering a spectrum of dialogues from
casual conversations to clinical interviews, we evaluated and compared LLMs'
performance in affect recognition. Our investigation explores the zero-shot and
few-shot capabilities of LLMs through in-context learning (ICL) as well as
their model capacities through task-specific fine-tuning. Additionally, this
study takes into account the potential impact of automatic speech recognition
(ASR) errors on LLM predictions. With this work, we aim to shed light on the
extent to which LLMs can replicate human-like affect recognition capabilities
in conversations
- …