Search CORE

65,434 research outputs found

In-Band Disparity Compensation for Multiview Image Compression and View Synthesis

Author: Anantrasirichai N
Bull DR
Canagarajah CN
Redmill DW
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2010
Field of study

Decision-Feedback Detection Strategy for Nonlinear Frequency-Division Multiplexing

Author: Civelli Stella
Forestieri Enrico
Secondini Marco
Publication venue: 'The Optical Society'
Publication date: 01/01/2018
Field of study

By exploiting a causality property of the nonlinear Fourier transform, a novel decision-feedback detection strategy for nonlinear frequency-division multiplexing (NFDM) systems is introduced. The performance of the proposed strategy is investigated both by simulations and by theoretical bounds and approximations, showing that it achieves a considerable performance improvement compared to previously adopted techniques in terms of Q-factor. The obtained improvement demonstrates that, by tailoring the detection strategy to the peculiar properties of the nonlinear Fourier transform, it is possible to boost the performance of NFDM systems and overcome current limitations imposed by the use of more conventional detection techniques suitable for the linear regime

arXiv.org e-Print Archive

Crossref

Archivio della ricerca della Scuola Superiore Sant'Anna

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Author: Chou Ju-chieh
Hsu Po-chun
Lee Hung-yi
Lee Lin-shan
Yeh Cheng-chieh
Publication venue
Publication date: 09/08/2018
Field of study

Speaking rate refers to the average number of phonemes within some unit time, while the rhythmic patterns refer to duration distributions for realizations of different phonemes within different phonetic structures. Both are key components of prosody in speech, which is different for different speakers. Models like cycle-consistent adversarial network (Cycle-GAN) and variational auto-encoder (VAE) have been successfully applied to voice conversion tasks without parallel data. However, due to the neural network architectures and feature vectors chosen for these approaches, the length of the predicted utterance has to be fixed to that of the input utterance, which limits the flexibility in mimicking the speaking rates and rhythmic patterns for the target speaker. On the other hand, sequence-to-sequence learning model was used to remove the above length constraint, but parallel training data are needed. In this paper, we propose an approach utilizing sequence-to-sequence model trained with unsupervised Cycle-GAN to perform the transformation between the phoneme posteriorgram sequences for different speakers. In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data. Preliminary evaluation on two datasets showed very encouraging results.Comment: 8 pages, 6 figures, Submitted to SLT 201

arXiv.org e-Print Archive

Crossref

New Constructions of Zero-Correlation Zone Sequences

Author: Chen Ching-Wei
Liu Yen-Cheng
Su Yu T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/02/2013
Field of study

In this paper, we propose three classes of systematic approaches for constructing zero correlation zone (ZCZ) sequence families. In most cases, these approaches are capable of generating sequence families that achieve the upper bounds on the family size (

K

) and the ZCZ width (

T

) for a given sequence period (

N

). Our approaches can produce various binary and polyphase ZCZ families with desired parameters

(N,K,T)

and alphabet size. They also provide additional tradeoffs amongst the above four system parameters and are less constrained by the alphabet size. Furthermore, the constructed families have nested-like property that can be either decomposed or combined to constitute smaller or larger ZCZ sequence sets. We make detailed comparisons with related works and present some extended properties. For each approach, we provide examples to numerically illustrate the proposed construction procedure.Comment: 37 pages, submitted to IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Crossref

Time-scale and pitch modifications of speech signals and resynthesis from the discrete short-time Fourier transform

Author: He Haiyan
Veldhuis Raymond
Publication venue: Elsevier
Publication date: 01/01/1996
Field of study

The modification methods described in this paper combine characteristics of PSOLA-based methods and algorithms that resynthesize speech from its short-time Fourier magnitude only. The starting point is a short-time Fourier representation of the signal. In the case of duration modification, portions, in voiced speech corresponding to pitch periods, are removed from or inserted in this representation. In the case of pitch modification, pitch periods are shortened or extended in this representation, and a number of pitch periods is inserted or removed, respectively. Since it is an important tool for both duration and pitch modification, the resynthesis-from-short-time-Fourier-magnitude-only method of Griffin and Lim (1984) and Griffin et al. (1984) is reviewed and adapted. Duration and pitch modification methods and their results are presented.\ud \u

CiteSeerX

Pure OAI Repository

University of Twente Research Information