333 research outputs found
ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment
The objective of stylized speech-driven facial animation is to create
animations that encapsulate specific emotional expressions. Existing methods
often depend on pre-established emotional labels or facial expression
templates, which may limit the necessary flexibility for accurately conveying
user intent. In this research, we introduce a technique that enables the
control of arbitrary styles by leveraging natural language as emotion prompts.
This technique presents benefits in terms of both flexibility and
user-friendliness. To realize this objective, we initially construct a
Text-Expression Alignment Dataset (TEAD), wherein each facial expression is
paired with several prompt-like descriptions.We propose an innovative automatic
annotation method, supported by Large Language Models (LLMs), to expedite the
dataset construction, thereby eliminating the substantial expense of manual
annotation. Following this, we utilize TEAD to train a CLIP-based model, termed
ExpCLIP, which encodes text and facial expressions into semantically aligned
style embeddings. The embeddings are subsequently integrated into the facial
animation generator to yield expressive and controllable facial animations.
Given the limited diversity of facial emotions in existing speech-driven facial
animation training data, we further introduce an effective Expression Prompt
Augmentation (EPA) mechanism to enable the animation generator to support
unprecedented richness in style control. Comprehensive experiments illustrate
that our method accomplishes expressive facial animation generation and offers
enhanced flexibility in effectively conveying the desired style
DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection
Graph Anomaly Detection (GAD) has recently become a hot research spot due to
its practicability and theoretical value. Since GAD emphasizes the application
and the rarity of anomalous samples, enriching the varieties of its datasets is
a fundamental work. Thus, this paper present DGraph, a real-world dynamic graph
in the finance domain. DGraph overcomes many limitations of current GAD
datasets. It contains about 3M nodes, 4M dynamic edges, and 1M ground-truth
nodes. We provide a comprehensive observation of DGraph, revealing that
anomalous nodes and normal nodes generally have different structures, neighbor
distribution, and temporal dynamics. Moreover, it suggests that those unlabeled
nodes are also essential for detecting fraudsters. Furthermore, we conduct
extensive experiments on DGraph. Observation and experiments demonstrate that
DGraph is propulsive to advance GAD research and enable in-depth exploration of
anomalous nodes.Comment: 9 page
Prompt-based Node Feature Extractor for Few-shot Learning on Text-Attributed Graphs
Text-attributed Graphs (TAGs) are commonly found in the real world, such as
social networks and citation networks, and consist of nodes represented by
textual descriptions. Currently, mainstream machine learning methods on TAGs
involve a two-stage modeling approach: (1) unsupervised node feature extraction
with pre-trained language models (PLMs); and (2) supervised learning using
Graph Neural Networks (GNNs). However, we observe that these representations,
which have undergone large-scale pre-training, do not significantly improve
performance with a limited amount of training samples. The main issue is that
existing methods have not effectively integrated information from the graph and
downstream tasks simultaneously. In this paper, we propose a novel framework
called G-Prompt, which combines a graph adapter and task-specific prompts to
extract node features. First, G-Prompt introduces a learnable GNN layer
(\emph{i.e.,} adaptor) at the end of PLMs, which is fine-tuned to better
capture the masked tokens considering graph neighborhood information. After the
adapter is trained, G-Prompt incorporates task-specific prompts to obtain
\emph{interpretable} node representations for the downstream task. Our
experiment results demonstrate that our proposed method outperforms current
state-of-the-art (SOTA) methods on few-shot node classification. More
importantly, in zero-shot settings, the G-Prompt embeddings can not only
provide better task interpretability than vanilla PLMs but also achieve
comparable performance with fully-supervised baselines.Comment: Under revie
Integrated photonics modular arithmetic processor
Integrated photonics computing has emerged as a promising approach to
overcome the limitations of electronic processors in the post-Moore era,
capitalizing on the superiority of photonic systems. However, present
integrated photonics computing systems face challenges in achieving
high-precision calculations, consequently limiting their potential
applications, and their heavy reliance on analog-to-digital (AD) and
digital-to-analog (DA) conversion interfaces undermines their performance. Here
we propose an innovative photonic computing architecture featuring scalable
calculation precision and a novel photonic conversion interface. By leveraging
Residue Number System (RNS) theory, the high-precision calculation is
decomposed into multiple low-precision modular arithmetic operations executed
through optical phase manipulation. Those operations directly interact with the
digital system via our proposed optical digital-to-phase converter (ODPC) and
phase-to-digital converter (OPDC). Through experimental demonstrations, we
showcase a calculation precision of 9 bits and verify the feasibility of the
ODPC/OPDC photonic interface. This approach paves the path towards liberating
photonic computing from the constraints imposed by limited precision and AD/DA
converters.Comment: 23 pages, 9 figure
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Recent years have witnessed significant advancements in self-supervised
learning (SSL) methods for speech-processing tasks. Various speech-based SSL
models have been developed and present promising performance on a range of
downstream tasks including speech recognition. However, existing speech-based
SSL models face a common dilemma in terms of computational cost, which might
hinder their potential application and in-depth academic research. To address
this issue, we first analyze the computational cost of different modules during
HuBERT pre-training and then introduce a stack of efficiency optimizations,
which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be
trained in 1.1 days with 8 V100 GPUs on the Librispeech 960h benchmark, without
performance degradation, resulting in a 5.2x speedup, compared to the original
implementation. Moreover, we explore two well-studied techniques in the
Fast-HuBERT and demonstrate consistent improvements as reported in previous
work
Online near-infrared analysis coupled with MWPLS and SiPLS models for the multi-ingredient and multi-phase extraction of licorice (Gancao)
Additional file 1. Table S1. The sampling intervals in different extraction phases. Table S2. The HPLC results of different indicators. Table S3. The evaluation parameters of PLS and SiPLS models
Closed-loop Controlled Brillouin Optical Time-Domain Analysis
A closed-loop controlled BOTDA distributed optical fibre sensor is proposed for tracking fast temperature-strain evolution. The measurement time is reduced by two orders of magnitude with respect to classical BOTDA sensing, while keeping the same accuracy and measurement conditions
Increasing robustness of bipolar pulse coding in Brillouin distributed fiber sensors
The robustness of bipolar pulse coding against pump depletion issues in Brillouin distributed fiber sensors is theoretically and experimentally investigated. The presented analysis points out that the effectiveness of bipolar coding in Brillouin sensing can be highly affected by the power unbalance between -1's and +1's elements resulting from depletion and amplification of coded pump pulses. In order to increase robustness against those detrimental effects and to alleviate the probe power limitation imposed by pump depletion, a technique using a three-tone probe is proposed. Experimental results demonstrate that this method allows increasing the probe power by more than 12.5 dB when compared to the existing single-probe tone implementation. This huge power increment, together with the 13.5 dB signal-to-noise enhancement provided by 512-bit bipolar Golay codes, has led to low-uncertainty measurements (< 0.9 MHz) of the local Brillouin peak gain frequency over a real remoteness of 100 km, using a 200 km-long fiber-loop and 2 m spatial resolution. The method is evaluated with a record figure-of-merit of 380'000
- …