136 research outputs found
Design of Quantum error correcting code for biased error on heavy-hexagon structure
Surface code is an error-correcting method that can be applied to the
implementation of a usable quantum computer. At present, a promising candidate
for a usable quantum computer is based on superconductor-specifically transmon.
Because errors in transmon-based quantum computers appear biasedly as Z type
errors, tailored surface and XZZX codes have been developed to deal with the
type errors. Even though these surface codes have been suggested for lattice
structures, since transmons-based quantum computers, developed by IBM, have a
heavy-hexagon structure, it is natural to ask how tailored surface code and
XZZX code can be implemented on the heavy-hexagon structure. In this study, we
provide a method for implementing tailored surface code and XZZX code on a
heavy-hexagon structure. Even when there is no bias, we obtain
as the threshold of the tailored surface code, which is much better than and as the thresholds of the surface code and XZZX
code, respectively. Furthermore, we can see that even though a decoder, which
is not the best of the syndromes, is used, the thresholds of the tailored
surface code and XZZX code increase as the bias of the Z error increases.
Finally, we show that in the case of infinite bias, the threshold of the
surface code is , but the thresholds of the tailored surface code
and XZZX code are and respectively
On a Question of Wintner Concerning the Sequence of Integers Composed of Primes from a Given Set
We answer to a Wintner's question
concerning the sequence of integers
composed of primes from a given set.
The results generalize and develop the answer to Wintner’s question due to
Tijdeman
Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
Talking face generation is the challenging task of synthesizing a natural and
realistic face that requires accurate synchronization with a given audio. Due
to co-articulation, where an isolated phone is influenced by the preceding or
following phones, the articulation of a phone varies upon the phonetic context.
Therefore, modeling lip motion with the phonetic context can generate more
spatio-temporally aligned lip movement. In this respect, we investigate the
phonetic context in generating lip motion for talking face generation. We
propose Context-Aware Lip-Sync framework (CALS), which explicitly leverages
phonetic context to generate lip movement of the target face. CALS is comprised
of an Audio-to-Lip module and a Lip-to-Face module. The former is pretrained
based on masked learning to map each phone to a contextualized lip motion unit.
The contextualized lip motion unit then guides the latter in synthesizing a
target identity with context-aware lip motion. From extensive experiments, we
verify that simply exploiting the phonetic context in the proposed CALS
framework effectively enhances spatio-temporal alignment. We also demonstrate
the extent to which the phonetic context assists in lip synchronization and
find the effective window size for lip generation to be approximately 1.2
seconds.Comment: Accepted at ICASSP 202
Reprogramming Audio-driven Talking Face Synthesis into Text-driven
In this paper, we propose a method to reprogram pre-trained audio-driven
talking face synthesis models to be able to operate with text inputs. As the
audio-driven talking face synthesis model takes speech audio as inputs, in
order to generate a talking avatar with the desired speech content, speech
recording needs to be performed in advance. However, this is burdensome to
record audio for every video to be generated. In order to alleviate this
problem, we propose a novel method that embeds input text into the learned
audio latent space of the pre-trained audio-driven model. To this end, we
design a Text-to-Audio Embedding Module (TAEM) which is guided to learn to map
a given text input to the audio latent features. Moreover, to model the speaker
characteristics lying in the audio features, we propose to inject visual
speaker embedding into the TAEM, which is obtained from a single face image.
After training, we can synthesize talking face videos with either text or
speech audio
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
This paper proposes a novel lip reading framework, especially for
low-resource languages, which has not been well addressed in the previous
literature. Since low-resource languages do not have enough video-text paired
data to train the model to have sufficient power to model lip movements and
language, it is regarded as challenging to develop lip reading models for
low-resource languages. In order to mitigate the challenge, we try to learn
general speech knowledge, the ability to model lip movements, from a
high-resource language through the prediction of speech units. It is known that
different languages partially share common phonemes, thus general speech
knowledge learned from one language can be extended to other languages. Then,
we try to learn language-specific knowledge, the ability to model language, by
proposing Language-specific Memory-augmented Decoder (LMDecoder). LMDecoder
saves language-specific audio features into memory banks and can be trained on
audio-text paired data which is more easily accessible than video-text paired
data. Therefore, with LMDecoder, we can transform the input speech units into
language-specific audio features and translate them into texts by utilizing the
learned rich language knowledge. Finally, by combining general speech knowledge
and language-specific knowledge, we can efficiently develop lip reading models
even for low-resource languages. Through extensive experiments using five
languages, English, Spanish, French, Italian, and Portuguese, the effectiveness
of the proposed method is evaluated.Comment: Accepted at ICCV 202
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Visual Speech Recognition (VSR) is the task of predicting spoken words from
silent lip movements. VSR is regarded as a challenging task because of the
insufficient information on lip movements. In this paper, we propose an Audio
Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement
the insufficient speech information of visual modality by using audio modality.
Different from the previous methods, the proposed AKVSR 1) utilizes rich audio
knowledge encoded by a large-scale pretrained audio model, 2) saves the
linguistic information of audio knowledge in compact audio memory by discarding
the non-linguistic information from the audio through quantization, and 3)
includes Audio Bridging Module which can find the best-matched audio features
from the compact audio memory, which makes our training possible without audio
inputs, once after the compact audio memory is composed. We validate the
effectiveness of the proposed method through extensive experiments, and achieve
new state-of-the-art performances on the widely-used datasets, LRS2 and LRS3
Empirical estimation of beach-face slope and its use for warning of berm erosion
Typical berm erosion and accretion are closely related to the beach-face slope. Empirical equation for prediction of the beach-face slope is proposed. The beach-face slope is expressed as a function of the wave period and the bed sediment grain size. Coefficients in the equation are obtained from three sets of carefully chosen laboratory data through a multiple linear regression with two independent variables using SPSS version 22. The computed correlation coefficient is as high as 0.983, which is believed to justify the validity of the present formulation. A shore profile is split into beach-face and underwater bed profile in the surf zone, and described with two straight lines. Possibility of using the beach-face slope strategically for warning of future berm erosion at the site is proposed
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
The challenge of talking face generation from speech lies in aligning two
different modal information, audio and video, such that the mouth region
corresponds to input audio. Previous methods either exploit audio-visual
representation learning or leverage intermediate structural information such as
landmarks and 3D models. However, they struggle to synthesize fine details of
the lips varying at the phoneme level as they do not sufficiently provide
visual information of the lips at the video synthesis step. To overcome this
limitation, our work proposes Audio-Lip Memory that brings in visual
information of the mouth region corresponding to input audio and enforces
fine-grained audio-visual coherence. It stores lip motion features from
sequential ground truth images in the value memory and aligns them with
corresponding audio features so that they can be retrieved using audio input at
inference time. Therefore, using the retrieved lip motion features as visual
hints, it can easily correlate audio with visual dynamics in the synthesis
step. By analyzing the memory, we demonstrate that unique lip features are
stored in each memory slot at the phoneme level, capturing subtle lip motion
based on memory addressing. In addition, we introduce visual-visual
synchronization loss which can enhance lip-syncing performance when used along
with audio-visual synchronization loss in our model. Extensive experiments are
performed to verify that our method generates high-quality video with mouth
shapes that best align with the input audio, outperforming previous
state-of-the-art methods.Comment: Accepted at AAAI 2022 (Oral
DeepSoCS: A Neural Scheduler for Heterogeneous System-on-Chip (SoC) Resource Scheduling
In this paper, we~present a novel scheduling solution for a class of
System-on-Chip (SoC) systems where heterogeneous chip resources (DSP, FPGA,
GPU, etc.) must be efficiently scheduled for continuously arriving hierarchical
jobs with their tasks represented by a directed acyclic graph. Traditionally,
heuristic algorithms have been widely used for many resource scheduling
domains, and Heterogeneous Earliest Finish Time (HEFT) has been a dominating
state-of-the-art technique across a broad range of heterogeneous resource
scheduling domains over many years. Despite their long-standing popularity,
HEFT-like algorithms are known to be vulnerable to a small amount of noise
added to the environment. Our Deep Reinforcement Learning (DRL)-based SoC
Scheduler (DeepSoCS), capable of learning the "best" task ordering under
dynamic environment changes, overcomes the brittleness of rule-based schedulers
such as HEFT with significantly higher performance across different types of
jobs. We~describe a DeepSoCS design process using a real-time heterogeneous SoC
scheduling emulator, discuss major challenges, and present two novel neural
network design features that lead to outperforming HEFT: (i) hierarchical job-
and task-graph embedding; and (ii) efficient use of real-time task information
in the state space. Furthermore, we~introduce effective techniques to address
two fundamental challenges present in our environment: delayed consequences and
joint actions. Through an extensive simulation study, we~show that our DeepSoCS
exhibits the significantly higher performance of job execution time than that
of HEFT with a higher level of robustness under realistic noise conditions.
We~conclude with a discussion of the potential improvements for our DeepSoCS
neural scheduler.Comment: 18 pages, Accepted by Electronics 202
- …