Search CORE

1,482 research outputs found

Communication-efficient Personalized Federated Edge Learning for Massive MIMO CSI Feedback

Author: Cui Yiming
Guo Jiajia
Jin Shi
Wen Chao-Kai
Publication venue
Publication date: 23/03/2023
Field of study

Deep learning (DL)-based channel state information (CSI) feedback has received significant research attention in recent years. However, previous research has overlooked the potential privacy disclosure problem caused by the transmission of CSI datasets during the training process. In this work, we introduce a federated edge learning (FEEL)-based training framework for DL-based CSI feedback. This approach differs from the conventional centralized learning (CL)-based framework in which the CSI datasets are collected at the base station (BS) before training. Instead, each user equipment (UE) trains a local autoencoder network and exchanges model parameters with the BS. This approach provides better protection for data privacy compared to CL. To further reduce communication overhead in FEEL, we quantize uplink and downlink model transmission into different bits based on their influence on feedback performance. Additionally, since the heterogeneity of CSI datasets in different UEs can degrade the performance of the FEEL-based framework, we introduce a personalization strategy to improve feedback performance. This strategy allows for local fine-tuning to adapt the global model to the channel characteristics of each UE. Simulation results indicate that the proposed personalized FEEL-based training framework can significantly improve the performance of DL-based CSI feedback while reducing communication overhead

arXiv.org e-Print Archive

Speech-to-speech Translation between Untranscribed Unknown Languages

Author: Nakamura Satoshi
Sakti Sakriani
Tjandra Andros
Publication venue
Publication date: 05/10/2019
Field of study

In this paper, we explore a method for training speech-to-speech translation tasks without any transcription or linguistic supervision. Our proposed method consists of two steps: First, we train and generate discrete representation with unsupervised term discovery with a discrete quantized autoencoder. Second, we train a sequence-to-sequence model that directly maps the source language speech to the target language's discrete representation. Our proposed method can directly generate target speech without any auxiliary or pre-training steps with a source or target transcription. To the best of our knowledge, this is the first work that performed pure speech-to-speech translation between untranscribed unknown languages.Comment: Accepted in IEEE ASRU 2019. Web-page for more samples & details: https://sp2code-translation-v1.netlify.com

arXiv.org e-Print Archive

Crossref

Can Language Models Learn to Listen?

Author: Darrell Trevor
Ginosar Shiry
Kanazawa Angjoo
Klein Dan
Ng Evonne
Subramanian Sanjay
Publication venue
Publication date: 21/08/2023
Field of study

We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose treating the quantized atomic motion elements as additional language token inputs to a transformer-based large language model. Initializing our transformer with the weights of a language model pre-trained only on text results in significantly higher quality listener responses than training a transformer from scratch. We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study. In our evaluation, we analyze the model's ability to utilize temporal and semantic aspects of spoken text. Project page: https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/Comment: ICCV 2023; Project page: https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen

arXiv.org e-Print Archive

ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

Author: Chen Tao
Chen Xin
Fan Jiayuan
Jiang Biao
Li Taihao
Yin Fukun
Yu Gang
Zhang Chi
Zhao Zibo
Publication venue
Publication date: 01/12/2023
Field of study

The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generations, versatile multimodal generative shape models can significantly benefit various fields like 3D virtual construction and network-aided design. In this work, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a word-sentence-paragraph framework to discretize continuous shapes into shape words, further assembles these words for shape sentences, as well as integrates shape with instructional text for multi-modal paragraphs. To learn this shape-language model, we use a three-stage training scheme, including shape representation, multimodal alignment, and instruction-based generation, to align shape-language codebooks and learn the intricate correlations among these modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing

arXiv.org e-Print Archive

Disentanglement via Latent Quantization

Author: Dorrell Will
Finn Chelsea
Hsu Kyle
Whittington James C. R.
Wu Jiajun
Publication venue
Publication date: 28/05/2023
Field of study

In disentangled representation learning, a model is asked to tease apart a dataset's underlying sources of variation and represent them independently of one another. Since the model is provided with no ground truth information about these sources, inductive biases take a paramount role in enabling disentanglement. In this work, we construct an inductive bias towards compositionally encoding and decoding data by enforcing a harsh communication bottleneck. Concretely, we do this by (i) quantizing the latent space into learnable discrete codes with a separate scalar codebook per dimension and (ii) applying strong model regularization via an unusually high weight decay. Intuitively, the quantization forces the encoder to use a small number of latent values across many datapoints, which in turn enables the decoder to assign a consistent meaning to each value. Regularization then serves to drive the model towards this parsimonious strategy. We demonstrate the broad applicability of this approach by adding it to both basic data-reconstructing (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models. In order to reliably assess these models, we also propose InfoMEC, new metrics for disentanglement that are cohesively grounded in information theory and fix well-established shortcomings in previous metrics. Together with regularization, latent quantization dramatically improves the modularity and explicitness of learned representations on a representative suite of benchmark datasets. In particular, our quantized-latent autoencoder (QLAE) consistently outperforms strong methods from prior work in these key disentanglement properties without compromising data reconstruction.Comment: 20 pages, 8 figures, code available at https://github.com/kylehkhsu/disentangl

arXiv.org e-Print Archive

Evaluation and analysis of the orbital maneuvering vehicle video system

Author: Moorhead Robert J., II
Publication venue
Publication date
Field of study

The work accomplished in the summer of 1989 in association with the NASA/ASEE Summer Faculty Research Fellowship Program at Marshall Space Flight Center is summarized. The task involved study of the Orbital Maneuvering Vehicle (OMV) Video Compression Scheme. This included such activities as reviewing the expected scenes to be compressed by the flight vehicle, learning the error characteristics of the communication channel, monitoring the CLASS tests, and assisting in development of test procedures and interface hardware for the bit error rate lab being developed at MSFC to test the VCU/VRU. Numerous comments and suggestions were made during the course of the fellowship period regarding the design and testing of the OMV Video System. Unfortunately from a technical point of view, the program appears at this point in time to be trouble from an expense prospective and is in fact in danger of being scaled back, if not cancelled altogether. This makes technical improvements prohibitive and cost-reduction measures necessary. Fortunately some cost-reduction possibilities and some significant technical improvements that should cost very little were identified

NASA Technical Reports Server

A comparative study of image compression techniques within a noisy channel environment

Author: Banafa Ahmed Yehia
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Counterspeeches up my sleeve! Intent Distribution Learning and Persistent Fusion for Intent-Conditioned Counterspeech Generation

Author: Akhtar Md. Shad
Bandhakavi Anil
Chakraborty Tanmoy
Desai Shaily
Goel Manvi
Gupta Rishabh
Publication venue
Publication date: 23/05/2023
Field of study

Counterspeech has been demonstrated to be an efficacious approach for combating hate speech. While various conventional and controlled approaches have been studied in recent years to generate counterspeech, a counterspeech with a certain intent may not be sufficient in every scenario. Due to the complex and multifaceted nature of hate speech, utilizing multiple forms of counter-narratives with varying intents may be advantageous in different circumstances. In this paper, we explore intent-conditioned counterspeech generation. At first, we develop IntentCONAN, a diversified intent-specific counterspeech dataset with 6831 counterspeeches conditioned on five intents, i.e., informative, denouncing, question, positive, and humour. Subsequently, we propose QUARC, a two-stage framework for intent-conditioned counterspeech generation. QUARC leverages vector-quantized representations learned for each intent category along with PerFuMe, a novel fusion module to incorporate intent-specific information into the model. Our evaluation demonstrates that QUARC outperforms several baselines by an average of 10% across evaluation metrics. An extensive human evaluation supplements our hypothesis of better and more appropriate responses than comparative systems.Comment: ACL 202

arXiv.org e-Print Archive