1,482 research outputs found
Communication-efficient Personalized Federated Edge Learning for Massive MIMO CSI Feedback
Deep learning (DL)-based channel state information (CSI) feedback has
received significant research attention in recent years. However, previous
research has overlooked the potential privacy disclosure problem caused by the
transmission of CSI datasets during the training process. In this work, we
introduce a federated edge learning (FEEL)-based training framework for
DL-based CSI feedback. This approach differs from the conventional centralized
learning (CL)-based framework in which the CSI datasets are collected at the
base station (BS) before training. Instead, each user equipment (UE) trains a
local autoencoder network and exchanges model parameters with the BS. This
approach provides better protection for data privacy compared to CL. To further
reduce communication overhead in FEEL, we quantize uplink and downlink model
transmission into different bits based on their influence on feedback
performance. Additionally, since the heterogeneity of CSI datasets in different
UEs can degrade the performance of the FEEL-based framework, we introduce a
personalization strategy to improve feedback performance. This strategy allows
for local fine-tuning to adapt the global model to the channel characteristics
of each UE. Simulation results indicate that the proposed personalized
FEEL-based training framework can significantly improve the performance of
DL-based CSI feedback while reducing communication overhead
Speech-to-speech Translation between Untranscribed Unknown Languages
In this paper, we explore a method for training speech-to-speech translation
tasks without any transcription or linguistic supervision. Our proposed method
consists of two steps: First, we train and generate discrete representation
with unsupervised term discovery with a discrete quantized autoencoder. Second,
we train a sequence-to-sequence model that directly maps the source language
speech to the target language's discrete representation. Our proposed method
can directly generate target speech without any auxiliary or pre-training steps
with a source or target transcription. To the best of our knowledge, this is
the first work that performed pure speech-to-speech translation between
untranscribed unknown languages.Comment: Accepted in IEEE ASRU 2019. Web-page for more samples & details:
https://sp2code-translation-v1.netlify.com
Can Language Models Learn to Listen?
We present a framework for generating appropriate facial responses from a
listener in dyadic social interactions based on the speaker's words. Given an
input transcription of the speaker's words with their timestamps, our approach
autoregressively predicts a response of a listener: a sequence of listener
facial gestures, quantized using a VQ-VAE. Since gesture is a language
component, we propose treating the quantized atomic motion elements as
additional language token inputs to a transformer-based large language model.
Initializing our transformer with the weights of a language model pre-trained
only on text results in significantly higher quality listener responses than
training a transformer from scratch. We show that our generated listener motion
is fluent and reflective of language semantics through quantitative metrics and
a qualitative user study. In our evaluation, we analyze the model's ability to
utilize temporal and semantic aspects of spoken text. Project page:
https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/Comment: ICCV 2023; Project page:
https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
The advent of large language models, enabling flexibility through
instruction-driven approaches, has revolutionized many traditional generative
tasks, but large models for 3D data, particularly in comprehensively handling
3D shapes with other modalities, are still under-explored. By achieving
instruction-based shape generations, versatile multimodal generative shape
models can significantly benefit various fields like 3D virtual construction
and network-aided design. In this work, we present ShapeGPT, a shape-included
multi-modal framework to leverage strong pre-trained language models to address
multiple shape-relevant tasks. Specifically, ShapeGPT employs a
word-sentence-paragraph framework to discretize continuous shapes into shape
words, further assembles these words for shape sentences, as well as integrates
shape with instructional text for multi-modal paragraphs. To learn this
shape-language model, we use a three-stage training scheme, including shape
representation, multimodal alignment, and instruction-based generation, to
align shape-language codebooks and learn the intricate correlations among these
modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable
performance across shape-relevant tasks, including text-to-shape,
shape-to-text, shape completion, and shape editing
Disentanglement via Latent Quantization
In disentangled representation learning, a model is asked to tease apart a
dataset's underlying sources of variation and represent them independently of
one another. Since the model is provided with no ground truth information about
these sources, inductive biases take a paramount role in enabling
disentanglement. In this work, we construct an inductive bias towards
compositionally encoding and decoding data by enforcing a harsh communication
bottleneck. Concretely, we do this by (i) quantizing the latent space into
learnable discrete codes with a separate scalar codebook per dimension and (ii)
applying strong model regularization via an unusually high weight decay.
Intuitively, the quantization forces the encoder to use a small number of
latent values across many datapoints, which in turn enables the decoder to
assign a consistent meaning to each value. Regularization then serves to drive
the model towards this parsimonious strategy. We demonstrate the broad
applicability of this approach by adding it to both basic data-reconstructing
(vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models. In
order to reliably assess these models, we also propose InfoMEC, new metrics for
disentanglement that are cohesively grounded in information theory and fix
well-established shortcomings in previous metrics. Together with
regularization, latent quantization dramatically improves the modularity and
explicitness of learned representations on a representative suite of benchmark
datasets. In particular, our quantized-latent autoencoder (QLAE) consistently
outperforms strong methods from prior work in these key disentanglement
properties without compromising data reconstruction.Comment: 20 pages, 8 figures, code available at
https://github.com/kylehkhsu/disentangl
Evaluation and analysis of the orbital maneuvering vehicle video system
The work accomplished in the summer of 1989 in association with the NASA/ASEE Summer Faculty Research Fellowship Program at Marshall Space Flight Center is summarized. The task involved study of the Orbital Maneuvering Vehicle (OMV) Video Compression Scheme. This included such activities as reviewing the expected scenes to be compressed by the flight vehicle, learning the error characteristics of the communication channel, monitoring the CLASS tests, and assisting in development of test procedures and interface hardware for the bit error rate lab being developed at MSFC to test the VCU/VRU. Numerous comments and suggestions were made during the course of the fellowship period regarding the design and testing of the OMV Video System. Unfortunately from a technical point of view, the program appears at this point in time to be trouble from an expense prospective and is in fact in danger of being scaled back, if not cancelled altogether. This makes technical improvements prohibitive and cost-reduction measures necessary. Fortunately some cost-reduction possibilities and some significant technical improvements that should cost very little were identified
Counterspeeches up my sleeve! Intent Distribution Learning and Persistent Fusion for Intent-Conditioned Counterspeech Generation
Counterspeech has been demonstrated to be an efficacious approach for
combating hate speech. While various conventional and controlled approaches
have been studied in recent years to generate counterspeech, a counterspeech
with a certain intent may not be sufficient in every scenario. Due to the
complex and multifaceted nature of hate speech, utilizing multiple forms of
counter-narratives with varying intents may be advantageous in different
circumstances. In this paper, we explore intent-conditioned counterspeech
generation. At first, we develop IntentCONAN, a diversified intent-specific
counterspeech dataset with 6831 counterspeeches conditioned on five intents,
i.e., informative, denouncing, question, positive, and humour. Subsequently, we
propose QUARC, a two-stage framework for intent-conditioned counterspeech
generation. QUARC leverages vector-quantized representations learned for each
intent category along with PerFuMe, a novel fusion module to incorporate
intent-specific information into the model. Our evaluation demonstrates that
QUARC outperforms several baselines by an average of 10% across evaluation
metrics. An extensive human evaluation supplements our hypothesis of better and
more appropriate responses than comparative systems.Comment: ACL 202
- …