187 research outputs found
Skydiver: A Spiking Neural Network Accelerator Exploiting Spatio-Temporal Workload Balance
Spiking Neural Networks (SNNs) are developed as a promising alternative to
Artificial Neural networks (ANNs) due to their more realistic brain-inspired
computing models. SNNs have sparse neuron firing over time, i.e.,
spatio-temporal sparsity; thus, they are useful to enable energy-efficient
hardware inference. However, exploiting spatio-temporal sparsity of SNNs in
hardware leads to unpredictable and unbalanced workloads, degrading the energy
efficiency. In this work, we propose an FPGA-based convolutional SNN
accelerator called Skydiver that exploits spatio-temporal workload balance. We
propose the Approximate Proportional Relation Construction (APRC) method that
can predict the relative workload channel-wisely and a Channel-Balanced
Workload Schedule (CBWS) method to increase the hardware workload balance ratio
to over 90%. Skydiver was implemented on a Xilinx XC7Z045 FPGA and verified on
image segmentation and MNIST classification tasks. Results show improved
throughput by 1.4X and 1.2X for the two tasks. Skydiver achieved 22.6 KFPS
throughput, and 42.4 uJ/Image prediction energy on the classification task with
98.5% accuracy.Comment: Accepted to be published in the IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, 202
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
While Diffusion Generative Models have achieved great success on image
generation tasks, how to efficiently and effectively incorporate them into
speech generation especially translation tasks remains a non-trivial problem.
Specifically, due to the low information density of speech data, the
transformed discrete speech unit sequence is much longer than the corresponding
text transcription, posing significant challenges to existing auto-regressive
models. Furthermore, it is not optimal to brutally apply discrete diffusion on
the speech unit sequence while disregarding the continuous space structure,
which will degrade the generation performance significantly. In this paper, we
propose a novel diffusion model by applying the diffusion forward process in
the \textit{continuous} speech representation space, while employing the
diffusion backward process in the \textit{discrete} speech unit space. In this
way, we preserve the semantic structure of the continuous speech representation
space in the diffusion process and integrate the continuous and discrete
diffusion models. We conduct extensive experiments on the textless direct
speech-to-speech translation task, where the proposed method achieves
comparable results to the computationally intensive auto-regressive baselines
(500 steps on average) with significantly fewer decoding steps (50 steps).Comment: Accepted in EMNLP2023 main conferenc
FrameFire: Enabling Efficient Spiking Neural Network Inference for Video Segmentation
Fast video recognition is essential for real-time scenarios, e.g., autonomous driving. However, applying existing Deep Neural Networks (DNNs) to individual high-resolution images is expensive due to large model sizes. Spiking Neural Networks (SNNs) are developed as a promising alternative to DNNs due to their more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatio-temporal sparsity; thus they are useful to enable energy-efficient computation. However, exploiting the spatio-temporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading energy efficiency. In this work, we, therefore, propose an SNN accelerator called FrameFire for efficient video processing. We introduce a Keyframe-dominated Workload Balance Schedule (KWBS) method. It accelerates the image recognition network with sparse keyframes, then records and analyzes the current workload distribution on hardware to facilitate scheduling workloads in subsequent frames. FrameFire is implemented on a Xilinx XC7Z035 FPGA and verified by video segmentation tasks. The results show that the throughput is improved by 1.7Ă— with the KWBS method. FrameFire achieved 1.04 KFPS throughput and 1.15 mJ/frame recognition energy
CEAT: Continual Expansion and Absorption Transformer for Non-Exemplar Class-Incremental Learning
In real-world applications, dynamic scenarios require the models to possess
the capability to learn new tasks continuously without forgetting the old
knowledge. Experience-Replay methods store a subset of the old images for joint
training. In the scenario of more strict privacy protection, storing the old
images becomes infeasible, which leads to a more severe plasticity-stability
dilemma and classifier bias. To meet the above challenges, we propose a new
architecture, named continual expansion and absorption transformer~(CEAT). The
model can learn the novel knowledge by extending the expanded-fusion layers in
parallel with the frozen previous parameters. After the task ends, we
losslessly absorb the extended parameters into the backbone to ensure that the
number of parameters remains constant. To improve the learning ability of the
model, we designed a novel prototype contrastive loss to reduce the overlap
between old and new classes in the feature space. Besides, to address the
classifier bias towards the new classes, we propose a novel approach to
generate the pseudo-features to correct the classifier. We experiment with our
methods on three standard Non-Exemplar Class-Incremental Learning~(NECIL)
benchmarks. Extensive experiments demonstrate that our model gets a significant
improvement compared with the previous works and achieves 5.38%, 5.20%, and
4.92% improvement on CIFAR-100, TinyImageNet, and ImageNet-Subset
Genetic code expansion in \u3ci\u3ePseudomonas putida\u3c/i\u3e KT2440
Pseudomonas putida KT2440 is an emerging microbial chassis for bio-based chemical production from renewable feedstocks and environmental bioremediation. However, tools for studying, engineering, and modulating protein complexes and biosynthetic enzymes in this organism are largely underdeveloped. Genetic code expansion for the incorporation of unnatural amino acids (unAAs) into proteins can advance such efforts and, furthermore, enable additional controls of biological processes of the strain. In this work, we established the orthogonality of two widely used archaeal tRNA synthetase and tRNA pairs in KT2440. Following the optimization of decoding systems, four unAAs were incorporated into proteins in response to a UAG stop codon at 34.6-78% efficiency. In addition, we demonstrated the utility of genetic code expansion through the incorporation of a photocrosslinking amino acid, p-benzoyl-L-phenylalanine (pBpa), into glutathione S-transferase (GstA) and a chemosensory response regulator (CheY) for protein-protein interaction studies in KT2440. This work reported the successful genetic code expansion in KT2440 for the first time. Given the diverse structure and functions of unAAs that have been added to protein syntheses using the archaeal systems, our research lays down a solid foundation for future work to study and enhance the biological functions of KT2440
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Audio-visual active speaker detection (AV-ASD) aims to identify which visible
face is speaking in a scene with one or more persons. Most existing AV-ASD
methods prioritize capturing speech-lip correspondence. However, there is a
noticeable gap in addressing the challenges from real-world AV-ASD scenarios.
Due to the presence of low-quality noisy videos in such cases, AV-ASD systems
without a selective listening ability are short of effectively filtering out
disruptive voice components from mixed audio inputs. In this paper, we propose
a Multi-modal Speaker Extraction-to-Detection framework named `MuSED', which is
pre-trained with audio-visual target speaker extraction to learn the denoising
ability, then it is fine-tuned with the AV-ASD task. Meanwhile, to better
capture the multi-modal information and deal with real-world problems such as
missing modality, MuSED is modelled on the time domain directly and integrates
the multi-modal plus-and-minus augmentation strategy. Our experiments
demonstrate that MuSED substantially outperforms the state-of-the-art AV-ASD
methods and achieves 95.6% mAP on the AVA-ActiveSpeaker dataset, 98.3% AP on
the ASW dataset, and 97.9% F1 on the Columbia AV-ASD dataset, respectively. We
will publicly release the code in due course.Comment: 10 page
Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends
Natural Language Processing (NLP) aims to analyze text or speech via
techniques in the computer science field. It serves the applications in domains
of healthcare, commerce, education and so on. Particularly, NLP has been widely
applied to the education domain and its applications have enormous potential to
help teaching and learning. In this survey, we review recent advances in NLP
with the focus on solving problems relevant to the education domain. In detail,
we begin with introducing the related background and the real-world scenarios
in education where NLP techniques could contribute. Then, we present a taxonomy
of NLP in the education domain and highlight typical NLP applications including
question answering, question construction, automated assessment, and error
correction. Next, we illustrate the task definition, challenges, and
corresponding cutting-edge techniques based on the above taxonomy. In
particular, LLM-involved methods are included for discussion due to the wide
usage of LLMs in diverse NLP applications. After that, we showcase some
off-the-shelf demonstrations in this domain. At last, we conclude with six
promising directions for future research, including more datasets in education
domain, controllable usage of LLMs, intervention of difficulty-level control,
interpretable educational NLP, methods with adaptive learning, and integrated
systems for education. We organize all relevant datasets and papers in the
open-available Github Link for better
review~\url{https://github.com/LiXinyuan1015/NLP-for-Education}
Research on the Design Methods for Green Renovation of Existing Buildings in Lingnan Region
China’s urbanization has entered a new stage with the promotion of “Carbon Peaking and Carbon Neutrality Goals” and “Urban Renewal Strategy”. Problems such as poor comfort, high energy consumption and unreasonable functions of existing buildings have attracted extensive attention from society. The climate-adapted human environment created by traditional buildings in the Lingnan region offers insights for the green transformation of buildings in this area. This paper summarizes the wisdom from the climate-adaptive construction of traditional buildings in Lingnan region, and proposes a green transformation design scheme that meets the requirements of energy efficiency and comfort, which provides a reference for the green renovation design of existing buildings
- …