284 research outputs found
IMPROVED DESIGN OF DTW AND GMM CASCADED ARABIC SPEAKER
In this paper, we discuss about the design, implementation and assessment of a two-stage Arabic speaker recognition system, which aims to recognize a target Arabic speaker among several people. The first stage uses improved DTW (Dynamic Time Warping) algorithm and the second stage uses SA-KM-based GMM (Gaussian Mixture Model). MFCC (Mel Frequency Cepstral Coefficients) and its differences form, as acoustic feature, are extracted from the sample speeches. DTW provides three most possible speakers and then the recognition results are conveyed to GMM training processes. A specified similarity assessment algorithm, KL distance, is applied to find the best match with the target speaker. Experimental results show that text-independent recognition rate of the cascaded system reaches 90 percent
Initial Task Allocation for Multi-Human Multi-Robot Teams with Attention-based Deep Reinforcement Learning
Multi-human multi-robot teams have great potential for complex and
large-scale tasks through the collaboration of humans and robots with diverse
capabilities and expertise. To efficiently operate such highly heterogeneous
teams and maximize team performance timely, sophisticated initial task
allocation strategies that consider individual differences across team members
and tasks are required. While existing works have shown promising results in
reallocating tasks based on agent state and performance, the neglect of the
inherent heterogeneity of the team hinders their effectiveness in realistic
scenarios. In this paper, we present a novel formulation of the initial task
allocation problem in multi-human multi-robot teams as contextual
multi-attribute decision-make process and propose an attention-based deep
reinforcement learning approach. We introduce a cross-attribute attention
module to encode the latent and complex dependencies of multiple attributes in
the state representation. We conduct a case study in a massive threat
surveillance scenario and demonstrate the strengths of our model.Comment: Accepted to IROS202
On the Representation of Causal Background Knowledge and its Applications in Causal Inference
Causal background knowledge about the existence or the absence of causal
edges and paths is frequently encountered in observational studies. The shared
directed edges and links of a subclass of Markov equivalent DAGs refined due to
background knowledge can be represented by a causal maximally partially
directed acyclic graph (MPDAG). In this paper, we first provide a sound and
complete graphical characterization of causal MPDAGs and give a minimal
representation of a causal MPDAG. Then, we introduce a novel representation
called direct causal clause (DCC) to represent all types of causal background
knowledge in a unified form. Using DCCs, we study the consistency and
equivalency of causal background knowledge and show that any causal background
knowledge set can be equivalently decomposed into a causal MPDAG plus a minimal
residual set of DCCs. Polynomial-time algorithms are also provided for checking
the consistency, equivalency, and finding the decomposed MPDAG and residual
DCCs. Finally, with causal background knowledge, we prove a sufficient and
necessary condition to identify causal effects and surprisingly find that the
identifiability of causal effects only depends on the decomposed MPDAG. We also
develop a local IDA-type algorithm to estimate the possible values of an
unidentifiable effect. Simulations suggest that causal background knowledge can
significantly improve the identifiability of causal effects
NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference
The inherent diversity of computation types within individual deep neural
network (DNN) models necessitates a corresponding variety of computation units
within hardware processors, leading to a significant constraint on computation
efficiency during neural network execution. In this study, we introduce
NeuralMatrix, a framework that transforms the computation of entire DNNs into
linear matrix operations, effectively enabling their execution with one
general-purpose matrix multiplication (GEMM) accelerator. By surmounting the
constraints posed by the diverse computation types required by individual
network models, this approach provides both generality, allowing a wide range
of DNN models to be executed using a single GEMM accelerator and
application-specific acceleration levels without extra special function units,
which are validated through main stream DNNs and their variant models.Comment: 12 pages, 4figures, Submitted to 11th International Conference on
Learning Representation
Imidazole-Based pH-Sensitive Convertible Liposomes for Anticancer Drug Delivery
In efforts to enhance the activity of liposomal drugs against solid tumors, three novel lipids that carry imidazole-based headgroups of incremental basicity were prepared and incorporated into the membrane of PEGylated liposomes containing doxorubicin (DOX) to render pH-sensitive convertible liposomes (ICL). The imidazole lipids were designed to protonate and cluster with negatively charged phosphatidylethanolamine-polyethylene glycol when pH drops from 7.4 to 6.0, thereby triggering ICL in acidic tumor interstitium. Upon the drop of pH, ICL gained more positive surface charges, displayed lipid phase separation in TEM and DSC, and aggregated with cell membrane-mimetic model liposomes. The drop of pH also enhanced DOX release from ICL consisting of one of the imidazole lipids, sn-2-((2,3-dihexadecyloxypropyl)thio)-5-methyl-1H-imidazole. ICL demonstrated superior activities against monolayer cells and several 3D MCS than the analogous PEGylated, pH-insensitive liposomes containing DOX, which serves as a control and clinical benchmark. The presence of cholesterol in ICL enhanced their colloidal stability but diminished their pH-sensitivity. ICL with the most basic imidazole lipid showed the highest activity in monolayer Hela cells; ICL with the imidazole lipid of medium basicity showed the highest anticancer activity in 3D MCS. ICL that balances the needs of tissue penetration, cell-binding, and drug release would yield optimal activity against solid tumors
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Direct speech-to-speech translation (S2ST) with discrete self-supervised
representations has achieved remarkable accuracy, but is unable to preserve the
speaker timbre of the source speech during translation. Meanwhile, the scarcity
of high-quality speaker-parallel data poses a challenge for learning style
transfer between source and target speech. We propose an S2ST framework with an
acoustic language model based on discrete units from a self-supervised model
and a neural codec for style transfer. The acoustic language model leverages
self-supervised in-context learning, acquiring the ability for style transfer
without relying on any speaker-parallel data, thereby overcoming the issue of
data scarcity. By using extensive training data, our model achieves zero-shot
cross-lingual style transfer on previously unseen source languages. Experiments
show that our model generates translated speeches with high fidelity and style
similarity. Audio samples are available at http://stylelm.github.io/ .Comment: 5 pages, 1 figure. submitted to ICASSP 202
Intelligent Reflecting Surface Aided Multi-Tier Hybrid Computing
The Digital twin edge network (DITEN) aims to integrate mobile edge computing
(MEC) and digital twin (DT) to provide real-time system configuration and
flexible resource allocation for the sixth-generation network. This paper
investigates an intelligent reflecting surface (IRS)-aided multi-tier hybrid
computing system that can achieve mutual benefits for DT and MEC in the DITEN.
For the first time, this paper presents the opportunity to realize the
network-wide convergence of DT and MEC. In the considered system, specifically,
over-the-air computation (AirComp) is employed to monitor the status of the DT
system, while MEC is performed with the assistance of DT to provide low-latency
computing services. Besides, the IRS is utilized to enhance signal transmission
and mitigate interference among heterogeneous nodes. We propose a framework for
designing the hybrid computing system, aiming to maximize the sum computation
rate under communication and computation resources constraints. To tackle the
non-convex optimization problem, alternative optimization and successive convex
approximation techniques are leveraged to decouple variables and then transform
the problem into a more tractable form. Simulation results verify the
effectiveness of the proposed algorithm and demonstrate the IRS can
significantly improve the system performance with appropriate phase shift
configurations. Moreover, the results indicate that the DT assisted MEC system
can precisely achieve the balance between local computing and task offloading
since real-time system status can be obtained with the help of DT. This paper
proposes the network-wide integration of DT and MEC, then demonstrates the
necessity of DT for achieving an optimal performance in DITEN systems through
analysis and numerical results
Better Zero-Shot Reasoning with Role-Play Prompting
Modern large language models (LLMs), such as ChatGPT, exhibit a remarkable
capacity for role-playing, enabling them to embody not only human characters
but also non-human entities like a Linux terminal. This versatility allows them
to simulate complex human-like interactions and behaviors within various
contexts, as well as to emulate specific objects or systems. While these
capabilities have enhanced user engagement and introduced novel modes of
interaction, the influence of role-playing on LLMs' reasoning abilities remains
underexplored. In this study, we introduce a strategically designed role-play
prompting methodology and assess its performance under the zero-shot setting
across twelve diverse reasoning benchmarks, encompassing arithmetic,
commonsense reasoning, symbolic reasoning, and more. Leveraging models such as
ChatGPT and Llama 2, our empirical results illustrate that role-play prompting
consistently surpasses the standard zero-shot approach across most datasets.
Notably, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from
23.8% to 84.2%. Beyond enhancing contextual understanding, we posit that
role-play prompting serves as an implicit Chain-of-Thought (CoT) trigger,
thereby improving the quality of reasoning. By comparing our approach with the
Zero-Shot-CoT technique, which prompts the model to "think step by step", we
further demonstrate that role-play prompting can generate a more effective CoT.
This highlights its potential to augment the reasoning capabilities of LLMs
Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition
Human state recognition is a critical topic with pervasive and important
applications in human-machine systems.Multi-modal fusion, the combination of
metrics from multiple data sources, has been shown as a sound method for
improving the recognition performance. However, while promising results have
been reported by recent multi-modal-based models, they generally fail to
leverage the sophisticated fusion strategies that would model sufficient
cross-modal interactions when producing the fusion representation; instead,
current methods rely on lengthy and inconsistent data preprocessing and feature
crafting. To address this limitation, we propose an end-to-end multi-modal
transformer framework for multi-modal human state recognition called
Husformer.Specifically, we propose to use cross-modal transformers, which
inspire one modality to reinforce itself through directly attending to latent
relevance revealed in other modalities, to fuse different modalities while
ensuring sufficient awareness of the cross-modal interactions introduced.
Subsequently, we utilize a self-attention transformer to further prioritize
contextual information in the fusion representation. Using two such attention
mechanisms enables effective and adaptive adjustments to noise and
interruptions in multi-modal signals during the fusion process and in relation
to high-level features. Extensive experiments on two human emotion corpora
(DEAP and WESAD) and two cognitive workload datasets (MOCAS and CogLoad)
demonstrate that in the recognition of human state, our Husformer outperforms
both state-of-the-art multi-modal baselines and the use of a single modality by
a large margin, especially when dealing with raw multi-modal signals. We also
conducted an ablation study to show the benefits of each component in
Husformer
- …