187 research outputs found

    Skydiver: A Spiking Neural Network Accelerator Exploiting Spatio-Temporal Workload Balance

    Full text link

    Skydiver: A Spiking Neural Network Accelerator Exploiting Spatio-Temporal Workload Balance

    Full text link
    Spiking Neural Networks (SNNs) are developed as a promising alternative to Artificial Neural networks (ANNs) due to their more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatio-temporal sparsity; thus, they are useful to enable energy-efficient hardware inference. However, exploiting spatio-temporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading the energy efficiency. In this work, we propose an FPGA-based convolutional SNN accelerator called Skydiver that exploits spatio-temporal workload balance. We propose the Approximate Proportional Relation Construction (APRC) method that can predict the relative workload channel-wisely and a Channel-Balanced Workload Schedule (CBWS) method to increase the hardware workload balance ratio to over 90%. Skydiver was implemented on a Xilinx XC7Z045 FPGA and verified on image segmentation and MNIST classification tasks. Results show improved throughput by 1.4X and 1.2X for the two tasks. Skydiver achieved 22.6 KFPS throughput, and 42.4 uJ/Image prediction energy on the classification task with 98.5% accuracy.Comment: Accepted to be published in the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 202

    DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation

    Full text link
    While Diffusion Generative Models have achieved great success on image generation tasks, how to efficiently and effectively incorporate them into speech generation especially translation tasks remains a non-trivial problem. Specifically, due to the low information density of speech data, the transformed discrete speech unit sequence is much longer than the corresponding text transcription, posing significant challenges to existing auto-regressive models. Furthermore, it is not optimal to brutally apply discrete diffusion on the speech unit sequence while disregarding the continuous space structure, which will degrade the generation performance significantly. In this paper, we propose a novel diffusion model by applying the diffusion forward process in the \textit{continuous} speech representation space, while employing the diffusion backward process in the \textit{discrete} speech unit space. In this way, we preserve the semantic structure of the continuous speech representation space in the diffusion process and integrate the continuous and discrete diffusion models. We conduct extensive experiments on the textless direct speech-to-speech translation task, where the proposed method achieves comparable results to the computationally intensive auto-regressive baselines (500 steps on average) with significantly fewer decoding steps (50 steps).Comment: Accepted in EMNLP2023 main conferenc

    FrameFire: Enabling Efficient Spiking Neural Network Inference for Video Segmentation

    Get PDF
    Fast video recognition is essential for real-time scenarios, e.g., autonomous driving. However, applying existing Deep Neural Networks (DNNs) to individual high-resolution images is expensive due to large model sizes. Spiking Neural Networks (SNNs) are developed as a promising alternative to DNNs due to their more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatio-temporal sparsity; thus they are useful to enable energy-efficient computation. However, exploiting the spatio-temporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading energy efficiency. In this work, we, therefore, propose an SNN accelerator called FrameFire for efficient video processing. We introduce a Keyframe-dominated Workload Balance Schedule (KWBS) method. It accelerates the image recognition network with sparse keyframes, then records and analyzes the current workload distribution on hardware to facilitate scheduling workloads in subsequent frames. FrameFire is implemented on a Xilinx XC7Z035 FPGA and verified by video segmentation tasks. The results show that the throughput is improved by 1.7Ă— with the KWBS method. FrameFire achieved 1.04 KFPS throughput and 1.15 mJ/frame recognition energy

    CEAT: Continual Expansion and Absorption Transformer for Non-Exemplar Class-Incremental Learning

    Full text link
    In real-world applications, dynamic scenarios require the models to possess the capability to learn new tasks continuously without forgetting the old knowledge. Experience-Replay methods store a subset of the old images for joint training. In the scenario of more strict privacy protection, storing the old images becomes infeasible, which leads to a more severe plasticity-stability dilemma and classifier bias. To meet the above challenges, we propose a new architecture, named continual expansion and absorption transformer~(CEAT). The model can learn the novel knowledge by extending the expanded-fusion layers in parallel with the frozen previous parameters. After the task ends, we losslessly absorb the extended parameters into the backbone to ensure that the number of parameters remains constant. To improve the learning ability of the model, we designed a novel prototype contrastive loss to reduce the overlap between old and new classes in the feature space. Besides, to address the classifier bias towards the new classes, we propose a novel approach to generate the pseudo-features to correct the classifier. We experiment with our methods on three standard Non-Exemplar Class-Incremental Learning~(NECIL) benchmarks. Extensive experiments demonstrate that our model gets a significant improvement compared with the previous works and achieves 5.38%, 5.20%, and 4.92% improvement on CIFAR-100, TinyImageNet, and ImageNet-Subset

    Genetic code expansion in \u3ci\u3ePseudomonas putida\u3c/i\u3e KT2440

    Get PDF
    Pseudomonas putida KT2440 is an emerging microbial chassis for bio-based chemical production from renewable feedstocks and environmental bioremediation. However, tools for studying, engineering, and modulating protein complexes and biosynthetic enzymes in this organism are largely underdeveloped. Genetic code expansion for the incorporation of unnatural amino acids (unAAs) into proteins can advance such efforts and, furthermore, enable additional controls of biological processes of the strain. In this work, we established the orthogonality of two widely used archaeal tRNA synthetase and tRNA pairs in KT2440. Following the optimization of decoding systems, four unAAs were incorporated into proteins in response to a UAG stop codon at 34.6-78% efficiency. In addition, we demonstrated the utility of genetic code expansion through the incorporation of a photocrosslinking amino acid, p-benzoyl-L-phenylalanine (pBpa), into glutathione S-transferase (GstA) and a chemosensory response regulator (CheY) for protein-protein interaction studies in KT2440. This work reported the successful genetic code expansion in KT2440 for the first time. Given the diverse structure and functions of unAAs that have been added to protein syntheses using the archaeal systems, our research lays down a solid foundation for future work to study and enhance the biological functions of KT2440

    Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

    Full text link
    Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons. Most existing AV-ASD methods prioritize capturing speech-lip correspondence. However, there is a noticeable gap in addressing the challenges from real-world AV-ASD scenarios. Due to the presence of low-quality noisy videos in such cases, AV-ASD systems without a selective listening ability are short of effectively filtering out disruptive voice components from mixed audio inputs. In this paper, we propose a Multi-modal Speaker Extraction-to-Detection framework named `MuSED', which is pre-trained with audio-visual target speaker extraction to learn the denoising ability, then it is fine-tuned with the AV-ASD task. Meanwhile, to better capture the multi-modal information and deal with real-world problems such as missing modality, MuSED is modelled on the time domain directly and integrates the multi-modal plus-and-minus augmentation strategy. Our experiments demonstrate that MuSED substantially outperforms the state-of-the-art AV-ASD methods and achieves 95.6% mAP on the AVA-ActiveSpeaker dataset, 98.3% AP on the ASW dataset, and 97.9% F1 on the Columbia AV-ASD dataset, respectively. We will publicly release the code in due course.Comment: 10 page

    Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends

    Full text link
    Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves the applications in domains of healthcare, commerce, education and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with the focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education where NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain. At last, we conclude with six promising directions for future research, including more datasets in education domain, controllable usage of LLMs, intervention of difficulty-level control, interpretable educational NLP, methods with adaptive learning, and integrated systems for education. We organize all relevant datasets and papers in the open-available Github Link for better review~\url{https://github.com/LiXinyuan1015/NLP-for-Education}

    Research on the Design Methods for Green Renovation of Existing Buildings in Lingnan Region

    Get PDF
    China’s urbanization has entered a new stage with the promotion of “Carbon Peaking and Carbon Neutrality Goals” and “Urban Renewal Strategy”. Problems such as poor comfort, high energy consumption and unreasonable functions of existing buildings have attracted extensive attention from society. The climate-adapted human environment created by traditional buildings in the Lingnan region offers insights for the green transformation of buildings in this area. This paper summarizes the wisdom from the climate-adaptive construction of traditional buildings in Lingnan region, and proposes a green transformation design scheme that meets the requirements of energy efficiency and comfort, which provides a reference for the green renovation design of existing buildings
    • …
    corecore