The translation of brain dynamics into natural language is pivotal for
brain-computer interfaces (BCIs), a field that has seen substantial growth in
recent years. With the swift advancement of large language models, such as
ChatGPT, the need to bridge the gap between the brain and languages becomes
increasingly pressing. Current methods, however, require eye-tracking fixations
or event markers to segment brain dynamics into word-level features, which can
restrict the practical application of these systems. These event markers may
not be readily available or could be challenging to acquire during real-time
inference, and the sequence of eye fixations may not align with the order of
spoken words. To tackle these issues, we introduce a novel framework, DeWave,
that integrates discrete encoding sequences into open-vocabulary EEG-to-text
translation tasks. DeWave uses a quantized variational encoder to derive
discrete codex encoding and align it with pre-trained language models. This
discrete codex representation brings forth two advantages: 1) it alleviates the
order mismatch between eye fixations and spoken words by introducing text-EEG
contrastive alignment training, and 2) it minimizes the interference caused by
individual differences in EEG waves through an invariant discrete codex. Our
model surpasses the previous baseline (40.1 and 31.7) by 3.06% and 6.34%,
respectively, achieving 41.35 BLEU-1 and 33.71 Rouge-F on the ZuCo Dataset.
Furthermore, this work is the first to facilitate the translation of entire EEG
signal periods without needing word-level order markers (e.g., eye fixations),
scoring 20.5 BLEU-1 and 29.5 Rouge-1 on the ZuCo Dataset, respectively. Codes
and the final paper will be public soon