Search CORE

35 research outputs found

Self-similarity parameter estimation and reproduction property for non-Gaussian Hermite processes

Author: Chronopoulou Alexandra
Tudor Ciprian
Viens Frederi
Publication venue
Publication date: 18/06/2010
Field of study

We consider the class of all the Hermite processes

(Z_{t}^{(q,H)})_{t\in \lbrack 0,1]}

of order

q\in \mathbf{N}^{\ast}

and with Hurst parameter

% H\in (\frac{1}{2},1)

. The process

Z^{(q,H)}

H

-selfsimilar, it has stationary increments and it exhibits long-range dependence identical to that of fractional Brownian motion (fBm). For

q=1

Z^{(1,H)}

is fBm, which is Gaussian; for

q=2

Z^{(2,H)}

is the Rosenblatt process, which lives in the second Wiener chaos; for any

q>2

Z^{(q,H)}

is a process in the

q

th Wiener chaos. We study the variations of

Z^{(q,H)}

for any

q

, by using multiple Wiener -It\^{o} stochastic integrals and Malliavin calculus. We prove a reproduction property for this class of processes in the sense that the terms appearing in the chaotic decomposition of their variations give rise to other Hermite processes of different orders and with different Hurst parameters. We apply our results to construct a strongly consistent estimator for the self-similarity parameter

H

from discrete observations of

Z^{(q,H)}

; the asymptotics of this estimator, after appropriate normalization, are proved to be distributed like a Rosenblatt random variable (value at time

1

of a Rosenblatt process).with self-similarity parameter

1+2(H-1)/q

.Comment: To appear in "Communications on Stochastic Analysis

arXiv.org e-Print Archive

Crossref

Louisiana State University

Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation

Author: Chronopoulou Alexandra
Fraser Alexander
Lai Wen
Publication venue
Publication date: 24/10/2023
Field of study

Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration. The data imbalance problem refers to the imbalance in the amount of parallel corpora for all language pairs, especially for long-tail languages (i.e., very low-resource languages). The representation degeneration problem refers to the problem of encoded tokens tending to appear only in a small subspace of the full space available to the MNMT model. To solve these two issues, we propose Bi-ACL, a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model. We define two modules, named bidirectional autoencoder and bidirectional contrastive learning, which we combine with an online constrained beam search and a curriculum learning sampling strategy. Extensive experiments show that our proposed method is more effective both in long-tail languages and in high-resource languages. We also demonstrate that our approach is capable of transferring knowledge between domains and languages in zero-shot scenarios.Comment: Accepted to Findings of EMNLP 2023, add statistical significance tests. code available at https://github.com/lavine-lmu/Bi-AC

arXiv.org e-Print Archive