29 research outputs found
CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models
As an indispensable ingredient of intelligence, commonsense reasoning is
crucial for large language models (LLMs) in real-world scenarios. In this
paper, we propose CORECODE, a dataset that contains abundant commonsense
knowledge manually annotated on dyadic dialogues, to evaluate the commonsense
reasoning and commonsense conflict detection capabilities of Chinese LLMs. We
categorize commonsense knowledge in everyday conversations into three
dimensions: entity, event, and social interaction. For easy and consistent
annotation, we standardize the form of commonsense knowledge annotation in
open-domain dialogues as "domain: slot = value". A total of 9 domains and 37
slots are defined to capture diverse commonsense knowledge. With these
pre-defined domains and slots, we collect 76,787 commonsense knowledge
annotations from 19,700 dialogues through crowdsourcing. To evaluate and
enhance the commonsense reasoning capability for LLMs on the curated dataset,
we establish a series of dialogue-level reasoning and detection tasks,
including commonsense knowledge filling, commonsense knowledge generation,
commonsense conflict phrase detection, domain identification, slot
identification, and event causal inference. A wide variety of existing
open-source Chinese LLMs are evaluated with these tasks on our dataset.
Experimental results demonstrate that these models are not competent to predict
CORECODE's plentiful reasoning content, and even ChatGPT could only achieve
0.275 and 0.084 accuracy on the domain identification and slot identification
tasks under the zero-shot setting. We release the data and codes of CORECODE at
https://github.com/danshi777/CORECODE to promote commonsense reasoning
evaluation and study of LLMs in the context of daily conversations.Comment: AAAI 202
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
Humans can effortlessly modify various prosodic attributes, such as the
placement of stress and the intensity of sentiment, to convey a specific
emotion while maintaining consistent linguistic content. Motivated by this
capability, we propose EmoAug, a novel style transfer model designed to enhance
emotional expression and tackle the data scarcity issue in speech emotion
recognition tasks. EmoAug consists of a semantic encoder and a paralinguistic
encoder that represent verbal and non-verbal information respectively.
Additionally, a decoder reconstructs speech signals by conditioning on the
aforementioned two information flows in an unsupervised fashion. Once training
is completed, EmoAug enriches expressions of emotional speech with different
prosodic attributes, such as stress, rhythm and intensity, by feeding different
styles into the paralinguistic encoder. EmoAug enables us to generate similar
numbers of samples for each class to tackle the data imbalance issue as well.
Experimental results on the IEMOCAP dataset demonstrate that EmoAug can
successfully transfer different speaking styles while retaining the speaker
identity and semantic content. Furthermore, we train a SER model with data
augmented by EmoAug and show that the augmented model not only surpasses the
state-of-the-art supervised and self-supervised methods but also overcomes
overfitting problems caused by data imbalance. Some audio samples can be found
on our demo website
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Human speech can be characterized by different components, including semantic
content, speaker identity and prosodic information. Significant progress has
been made in disentangling representations for semantic content and speaker
identity in Automatic Speech Recognition (ASR) and speaker verification tasks
respectively. However, it is still an open challenging research question to
extract prosodic information because of the intrinsic association of different
attributes, such as timbre and rhythm, and because of the need for supervised
training schemes to achieve robust large-scale and speaker-independent ASR. The
aim of this paper is to address the disentanglement of emotional prosody from
speech based on unsupervised reconstruction. Specifically, we identify, design,
implement and integrate three crucial components in our proposed speech
reconstruction model Prosody2Vec: (1) a unit encoder that transforms speech
signals into discrete units for semantic content, (2) a pretrained speaker
verification model to generate speaker identity embeddings, and (3) a trainable
prosody encoder to learn prosody representations. We first pretrain the
Prosody2Vec representations on unlabelled emotional speech corpora, then
fine-tune the model on specific datasets to perform Speech Emotion Recognition
(SER) and Emotional Voice Conversion (EVC) tasks. Both objective (weighted and
unweighted accuracies) and subjective (mean opinion score) evaluations on the
EVC task suggest that Prosody2Vec effectively captures general prosodic
features that can be smoothly transferred to other emotional speech. In
addition, our SER experiments on the IEMOCAP dataset reveal that the prosody
features learned by Prosody2Vec are complementary and beneficial for the
performance of widely used speech pretraining models and surpass the
state-of-the-art methods when combining Prosody2Vec with HuBERT
representations.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language
Processin
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
3D dense captioning requires a model to translate its understanding of an
input 3D scene into several captions associated with different object regions.
Existing methods adopt a sophisticated "detect-then-describe" pipeline, which
builds explicit relation modules upon a 3D detector with numerous hand-crafted
components. While these methods have achieved initial success, the cascade
pipeline tends to accumulate errors because of duplicated and inaccurate box
estimations and messy 3D scenes. In this paper, we first propose Vote2Cap-DETR,
a simple-yet-effective transformer framework that decouples the decoding
process of caption generation and object localization through parallel
decoding. Moreover, we argue that object localization and description
generation require different levels of scene understanding, which could be
challenging for a shared set of queries to capture. To this end, we propose an
advanced version, Vote2Cap-DETR++, which decouples the queries into
localization and caption queries to capture task-specific features.
Additionally, we introduce the iterative spatial refinement strategy to vote
queries for faster convergence and better localization performance. We also
insert additional spatial information to the caption head for more accurate
descriptions. Without bells and whistles, extensive experiments on two commonly
used datasets, ScanRefer and Nr3D, demonstrate Vote2Cap-DETR and
Vote2Cap-DETR++ surpass conventional "detect-then-describe" methods by a large
margin. Codes will be made available at
https://github.com/ch3cook-fdu/Vote2Cap-DETR
Frame Pairwise Distance Loss for Weakly-supervised Sound Event Detection
Weakly-supervised learning has emerged as a promising approach to leverage
limited labeled data in various domains by bridging the gap between fully
supervised methods and unsupervised techniques. Acquisition of strong
annotations for detecting sound events is prohibitively expensive, making
weakly supervised learning a more cost-effective and broadly applicable
alternative. In order to enhance the recognition rate of the learning of
detection of weakly-supervised sound events, we introduce a Frame Pairwise
Distance (FPD) loss branch, complemented with a minimal amount of synthesized
data. The corresponding sampling and label processing strategies are also
proposed. Two distinct distance metrics are employed to evaluate the proposed
approach. Finally, the method is validated on the DCASE 2023 task4 dataset. The
obtained experimental results corroborated the efficacy of this approach.Comment: Submitted to ICASSP 202
Expression of catalytically active matrix metalloproteinase‐1 in dermal fibroblasts induces collagen fragmentation and functional alterations that resemble aged human skin
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/99047/1/acel12089.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/99047/2/acel12089-sup-0001-FigS1-S4.pd
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
The advent of large language models, enabling flexibility through
instruction-driven approaches, has revolutionized many traditional generative
tasks, but large models for 3D data, particularly in comprehensively handling
3D shapes with other modalities, are still under-explored. By achieving
instruction-based shape generations, versatile multimodal generative shape
models can significantly benefit various fields like 3D virtual construction
and network-aided design. In this work, we present ShapeGPT, a shape-included
multi-modal framework to leverage strong pre-trained language models to address
multiple shape-relevant tasks. Specifically, ShapeGPT employs a
word-sentence-paragraph framework to discretize continuous shapes into shape
words, further assembles these words for shape sentences, as well as integrates
shape with instructional text for multi-modal paragraphs. To learn this
shape-language model, we use a three-stage training scheme, including shape
representation, multimodal alignment, and instruction-based generation, to
align shape-language codebooks and learn the intricate correlations among these
modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable
performance across shape-relevant tasks, including text-to-shape,
shape-to-text, shape completion, and shape editing
Gray matter density reduction associated with adjuvant chemotherapy in older women with breast cancer
PURPOSE:
The purpose of this study was to evaluate longitudinal changes in brain gray matter density (GMD) before and after adjuvant chemotherapy in older women with breast cancer.
METHODS:
We recruited 16 women aged ≥ 60 years with stage I-III breast cancers receiving adjuvant chemotherapy (CT) and 15 age- and sex-matched healthy controls (HC). The CT group underwent brain MRI and the NIH Toolbox for Cognition testing prior to adjuvant chemotherapy (time point 1, TP1) and within 1 month after chemotherapy (time point 2, TP2). The HC group underwent the same assessments at matched intervals. GMD was evaluated with the voxel-based morphometry.
RESULTS:
The mean age was 67 years in the CT group and 68.5 years in the HC group. There was significant GMD reduction within the chemotherapy group from TP1 to TP2. Compared to the HC group, the CT group displayed statistically significantly greater GMD reductions from TP1 to TP2 in the brain regions involving the left anterior cingulate gyrus, right insula, and left middle temporal gyrus (pFWE(family-wise error)-corrected < 0.05). The baseline GMD in left insula was positively correlated with the baseline list-sorting working memory score in the HC group (pFWE-corrected < 0.05). No correlation was observed for the changes in GMD with the changes in cognitive testing scores from TP1 to TP2 (pFWE-corrected < 0.05).
CONCLUSIONS:
Our findings indicate that GMD reductions were associated with adjuvant chemotherapy in older women with breast cancer. Future studies are needed to understand the clinical significance of the neuroimaging findings. This study is registered on ClinicalTrials.gov (NCT01992432)
Intrinsic brain activity changes associated with adjuvant chemotherapy in older women with breast cancer: a pilot longitudinal study
Purpose
Older cancer patients are at increased risk of cancer-related cognitive impairment. The purpose of this study was to assess the alterations in intrinsic brain activity associated with adjuvant chemotherapy in older women with breast cancer.
Methods
Chemotherapy treatment (CT) group included sixteen women aged ≥ 60 years (range 60–82 years) with stage I-III breast cancers, who underwent both resting-state functional magnetic resonance imaging (rs-fMRI) and neuropsychological testing with NIH Toolbox for Cognition before adjuvant chemotherapy, at time point 1 (TP1), and again within 1 month after completing chemotherapy, at time point 2 (TP2). Fourteen age- and sex-matched healthy controls (HC) underwent the same assessments at matched intervals. Three voxel-wise rs-fMRI parameters: amplitude of low-frequency fluctuation (ALFF), fractional ALFF (fALFF), and regional homogeneity (ReHo), were computed at each time point. The changes in rs-fMRI parameters from TP1 to TP2 for each group, the group differences in changes (the CT group vs. the HC group), and the group difference in the baseline rs-fMRI parameters were assessed. In addition, correlative analysis between the rs-fMRI parameters and neuropsychological testing scores was also performed.
Results
In the CT group, one brain region, which included parts of the bilateral subcallosal gyri and right anterior cingulate gyrus, displayed increased ALFF from TP1 to TP2 (cluster p-corrected=0.024); another brain region in the left precuneus displayed decreased fALFF from TP1 to TP2 (cluster level p-corrected=0.025). No significant changes in the rs-fMRI parameters from TP1 to TP2 were observed in the HC group. Although ALFF and fALFF alterations were observed only in the CT group, none of the between-group differences in rs-fMRI parameter changes reached statistical significance.
Conclusions
Our study results of ALFF and fALFF alterations in the chemotherapy-treated women suggest that adjuvant chemotherapy may affect intrinsic brain activity in older women with breast cancer
Effects of chemotherapy on aging white matter microstructure: a longitudinal diffusion tensor imaging study
Objective: We aimed to use diffusion tensor imaging (DTI) to detect alterations in white matter microstructure in older patients with breast cancer receiving chemotherapy.
Methods: We recruited women age ≥60 years with stage I-III breast cancer (chemotherapy [CT] group; n = 19) to undergo two study assessments: at baseline and within one month after chemotherapy. Each assessment consisted of a brain magnetic resonance imaging scan with DTI and neuropsychological (NP) testing using the National Institutes of Health (NIH) Toolbox Cognition Battery. An age- and sex-matched group of healthy controls (HC, n = 14) underwent the same assessments at matched intervals. Four DTI parameters (fractional anisotropy [FA], mean diffusivity [MD], axial diffusivity [AD], and radial diffusivity [RD]) were calculated and correlated with NP testing scores.
Results: For CT group but not HCs, we detected statistically significant increases in MD and RD in the genu of the corpus callosum from time point 1 to time point 2 at p 0.05).
Conclusions: We identified alterations in white matter microstructures in older women with breast cancer undergoing chemotherapy. These findings may potentially serve as neuroimaging biomarkers for identifying cognitive impairment in older adults with cancer