Search CORE

284 research outputs found

IMPROVED DESIGN OF DTW AND GMM CASCADED ARABIC SPEAKER

Author: Chen Shuoshuo
Yang Ruiqi
Zhao Junbo
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2013
Field of study

In this paper, we discuss about the design, implementation and assessment of a two-stage Arabic speaker recognition system, which aims to recognize a target Arabic speaker among several people. The first stage uses improved DTW (Dynamic Time Warping) algorithm and the second stage uses SA-KM-based GMM (Gaussian Mixture Model). MFCC (Mel Frequency Cepstral Coefficients) and its differences form, as acoustic feature, are extracted from the sample speeches. DTW provides three most possible speakers and then the recognition results are conveyed to GMM training processes. A specified similarity assessment algorithm, KL distance, is applied to find the best match with the target speaker. Experimental results show that text-independent recognition rate of the cascaded system reaches 90 percent

Neliti

Crossref

Directory of Open Access Journals

Jurnal Ilmu Komputer dan Informasi

Initial Task Allocation for Multi-Human Multi-Robot Teams with Attention-based Deep Reinforcement Learning

Author: Min Byung-Cheol
Wang Ruiqi
Zhao Dezhong
Publication venue
Publication date: 07/07/2023
Field of study

Multi-human multi-robot teams have great potential for complex and large-scale tasks through the collaboration of humans and robots with diverse capabilities and expertise. To efficiently operate such highly heterogeneous teams and maximize team performance timely, sophisticated initial task allocation strategies that consider individual differences across team members and tasks are required. While existing works have shown promising results in reallocating tasks based on agent state and performance, the neglect of the inherent heterogeneity of the team hinders their effectiveness in realistic scenarios. In this paper, we present a novel formulation of the initial task allocation problem in multi-human multi-robot teams as contextual multi-attribute decision-make process and propose an attention-based deep reinforcement learning approach. We introduce a cross-attribute attention module to encode the latent and complex dependencies of multiple attributes in the state representation. We conduct a case study in a massive threat surveillance scenario and demonstrate the strengths of our model.Comment: Accepted to IROS202

arXiv.org e-Print Archive

On the Representation of Causal Background Knowledge and its Applications in Causal Inference

Author: Fang Zhuangyan
He Yangbo
Liu Yue
Zhao Ruiqi
Publication venue
Publication date: 10/07/2022
Field of study

Causal background knowledge about the existence or the absence of causal edges and paths is frequently encountered in observational studies. The shared directed edges and links of a subclass of Markov equivalent DAGs refined due to background knowledge can be represented by a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical characterization of causal MPDAGs and give a minimal representation of a causal MPDAG. Then, we introduce a novel representation called direct causal clause (DCC) to represent all types of causal background knowledge in a unified form. Using DCCs, we study the consistency and equivalency of causal background knowledge and show that any causal background knowledge set can be equivalently decomposed into a causal MPDAG plus a minimal residual set of DCCs. Polynomial-time algorithms are also provided for checking the consistency, equivalency, and finding the decomposed MPDAG and residual DCCs. Finally, with causal background knowledge, we prove a sufficient and necessary condition to identify causal effects and surprisingly find that the identifiability of causal effects only depends on the decomposed MPDAG. We also develop a local IDA-type algorithm to estimate the possible values of an unidentifiable effect. Simulations suggest that causal background knowledge can significantly improve the identifiability of causal effects

arXiv.org e-Print Archive

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

Author: He Xin
Li Yiran
Sun Ruiqi
Zhao Jie
Zou An
Publication venue
Publication date: 06/10/2023
Field of study

The inherent diversity of computation types within individual deep neural network (DNN) models necessitates a corresponding variety of computation units within hardware processors, leading to a significant constraint on computation efficiency during neural network execution. In this study, we introduce NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations, effectively enabling their execution with one general-purpose matrix multiplication (GEMM) accelerator. By surmounting the constraints posed by the diverse computation types required by individual network models, this approach provides both generality, allowing a wide range of DNN models to be executed using a single GEMM accelerator and application-specific acceleration levels without extra special function units, which are validated through main stream DNNs and their variant models.Comment: 12 pages, 4figures, Submitted to 11th International Conference on Learning Representation

arXiv.org e-Print Archive

Imidazole-Based pH-Sensitive Convertible Liposomes for Anticancer Drug Delivery

Author: Guo Xin
Gyanani Vijay
Huang Ruiqi
Lu Yifan
Zhao Shen
Publication venue: Scholarly Commons
Publication date: 03/03/2022
Field of study

In efforts to enhance the activity of liposomal drugs against solid tumors, three novel lipids that carry imidazole-based headgroups of incremental basicity were prepared and incorporated into the membrane of PEGylated liposomes containing doxorubicin (DOX) to render pH-sensitive convertible liposomes (ICL). The imidazole lipids were designed to protonate and cluster with negatively charged phosphatidylethanolamine-polyethylene glycol when pH drops from 7.4 to 6.0, thereby triggering ICL in acidic tumor interstitium. Upon the drop of pH, ICL gained more positive surface charges, displayed lipid phase separation in TEM and DSC, and aggregated with cell membrane-mimetic model liposomes. The drop of pH also enhanced DOX release from ICL consisting of one of the imidazole lipids, sn-2-((2,3-dihexadecyloxypropyl)thio)-5-methyl-1H-imidazole. ICL demonstrated superior activities against monolayer cells and several 3D MCS than the analogous PEGylated, pH-insensitive liposomes containing DOX, which serves as a control and clinical benchmark. The presence of cholesterol in ICL enhanced their colloidal stability but diminished their pH-sensitivity. ICL with the most basic imidazole lipid showed the highest activity in monolayer Hela cells; ICL with the imidazole lipid of medium basicity showed the highest anticancer activity in 3D MCS. ICL that balances the needs of tissue penetration, cell-binding, and drug release would yield optimal activity against solid tumors

Scholarly Commons

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

Author: Bai Jionghao
Hong Zhiqing
Huang Rongjie
Li Ruiqi
Wang Yongqi
Zhao Zhou
Publication venue
Publication date: 14/09/2023
Field of study

Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy, but is unable to preserve the speaker timbre of the source speech during translation. Meanwhile, the scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer between source and target speech. We propose an S2ST framework with an acoustic language model based on discrete units from a self-supervised model and a neural codec for style transfer. The acoustic language model leverages self-supervised in-context learning, acquiring the ability for style transfer without relying on any speaker-parallel data, thereby overcoming the issue of data scarcity. By using extensive training data, our model achieves zero-shot cross-lingual style transfer on previously unseen source languages. Experiments show that our model generates translated speeches with high fidelity and style similarity. Audio samples are available at http://stylelm.github.io/ .Comment: 5 pages, 1 figure. submitted to ICASSP 202

arXiv.org e-Print Archive

Intelligent Reflecting Surface Aided Multi-Tier Hybrid Computing

Author: Chen Guangji
Chen Wen
Liu Ruiqi
Ma Shaodan
Wu Qingqing
Wu Yuan
Zhao Ming-Min
Zhao Yapeng
Publication venue
Publication date: 18/08/2023
Field of study

The Digital twin edge network (DITEN) aims to integrate mobile edge computing (MEC) and digital twin (DT) to provide real-time system configuration and flexible resource allocation for the sixth-generation network. This paper investigates an intelligent reflecting surface (IRS)-aided multi-tier hybrid computing system that can achieve mutual benefits for DT and MEC in the DITEN. For the first time, this paper presents the opportunity to realize the network-wide convergence of DT and MEC. In the considered system, specifically, over-the-air computation (AirComp) is employed to monitor the status of the DT system, while MEC is performed with the assistance of DT to provide low-latency computing services. Besides, the IRS is utilized to enhance signal transmission and mitigate interference among heterogeneous nodes. We propose a framework for designing the hybrid computing system, aiming to maximize the sum computation rate under communication and computation resources constraints. To tackle the non-convex optimization problem, alternative optimization and successive convex approximation techniques are leveraged to decouple variables and then transform the problem into a more tractable form. Simulation results verify the effectiveness of the proposed algorithm and demonstrate the IRS can significantly improve the system performance with appropriate phase shift configurations. Moreover, the results indicate that the DT assisted MEC system can precisely achieve the balance between local computing and task offloading since real-time system status can be obtained with the help of DT. This paper proposes the network-wide integration of DT and MEC, then demonstrates the necessity of DT for achieving an optimal performance in DITEN systems through analysis and numerical results

arXiv.org e-Print Archive

A reusable localized surface plasmon resonance biosensor for quantitative detection of serum squamous cell carcinoma antigen in cervical cancer patients based on silver nanoparticles array

Author: Huan Yang
Jialing Yuan
Mingrong Xi
Qianying Zhao
Ruiqi Duan
Yi Quan
Publication venue: 'Dove Medical Press Ltd.'
Publication date
Field of study

Crossref

Better Zero-Shot Reasoning with Role-Play Prompting

Author: Chen Hao
Kong Aobo
Li Qicheng
Qin Yong
Sun Ruiqi
Zhao Shiwan
Zhou Xin
Publication venue
Publication date: 15/08/2023
Field of study

Modern large language models (LLMs), such as ChatGPT, exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities like a Linux terminal. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced novel modes of interaction, the influence of role-playing on LLMs' reasoning abilities remains underexplored. In this study, we introduce a strategically designed role-play prompting methodology and assess its performance under the zero-shot setting across twelve diverse reasoning benchmarks, encompassing arithmetic, commonsense reasoning, symbolic reasoning, and more. Leveraging models such as ChatGPT and Llama 2, our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%. Beyond enhancing contextual understanding, we posit that role-play prompting serves as an implicit Chain-of-Thought (CoT) trigger, thereby improving the quality of reasoning. By comparing our approach with the Zero-Shot-CoT technique, which prompts the model to "think step by step", we further demonstrate that role-play prompting can generate a more effective CoT. This highlights its potential to augment the reasoning capabilities of LLMs

arXiv.org e-Print Archive

Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition

Author: Chen Guohua
Jo Wonse
Min Byung-Cheol
Wang Ruiqi
Wang Weizheng
Yang Baijian
Zhao Dezhong
Publication venue
Publication date: 29/09/2022
Field of study

Human state recognition is a critical topic with pervasive and important applications in human-machine systems.Multi-modal fusion, the combination of metrics from multiple data sources, has been shown as a sound method for improving the recognition performance. However, while promising results have been reported by recent multi-modal-based models, they generally fail to leverage the sophisticated fusion strategies that would model sufficient cross-modal interactions when producing the fusion representation; instead, current methods rely on lengthy and inconsistent data preprocessing and feature crafting. To address this limitation, we propose an end-to-end multi-modal transformer framework for multi-modal human state recognition called Husformer.Specifically, we propose to use cross-modal transformers, which inspire one modality to reinforce itself through directly attending to latent relevance revealed in other modalities, to fuse different modalities while ensuring sufficient awareness of the cross-modal interactions introduced. Subsequently, we utilize a self-attention transformer to further prioritize contextual information in the fusion representation. Using two such attention mechanisms enables effective and adaptive adjustments to noise and interruptions in multi-modal signals during the fusion process and in relation to high-level features. Extensive experiments on two human emotion corpora (DEAP and WESAD) and two cognitive workload datasets (MOCAS and CogLoad) demonstrate that in the recognition of human state, our Husformer outperforms both state-of-the-art multi-modal baselines and the use of a single modality by a large margin, especially when dealing with raw multi-modal signals. We also conducted an ablation study to show the benefits of each component in Husformer

arXiv.org e-Print Archive