29 research outputs found

    CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models

    Full text link
    As an indispensable ingredient of intelligence, commonsense reasoning is crucial for large language models (LLMs) in real-world scenarios. In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. We categorize commonsense knowledge in everyday conversations into three dimensions: entity, event, and social interaction. For easy and consistent annotation, we standardize the form of commonsense knowledge annotation in open-domain dialogues as "domain: slot = value". A total of 9 domains and 37 slots are defined to capture diverse commonsense knowledge. With these pre-defined domains and slots, we collect 76,787 commonsense knowledge annotations from 19,700 dialogues through crowdsourcing. To evaluate and enhance the commonsense reasoning capability for LLMs on the curated dataset, we establish a series of dialogue-level reasoning and detection tasks, including commonsense knowledge filling, commonsense knowledge generation, commonsense conflict phrase detection, domain identification, slot identification, and event causal inference. A wide variety of existing open-source Chinese LLMs are evaluated with these tasks on our dataset. Experimental results demonstrate that these models are not competent to predict CORECODE's plentiful reasoning content, and even ChatGPT could only achieve 0.275 and 0.084 accuracy on the domain identification and slot identification tasks under the zero-shot setting. We release the data and codes of CORECODE at https://github.com/danshi777/CORECODE to promote commonsense reasoning evaluation and study of LLMs in the context of daily conversations.Comment: AAAI 202

    Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer

    Full text link
    Humans can effortlessly modify various prosodic attributes, such as the placement of stress and the intensity of sentiment, to convey a specific emotion while maintaining consistent linguistic content. Motivated by this capability, we propose EmoAug, a novel style transfer model designed to enhance emotional expression and tackle the data scarcity issue in speech emotion recognition tasks. EmoAug consists of a semantic encoder and a paralinguistic encoder that represent verbal and non-verbal information respectively. Additionally, a decoder reconstructs speech signals by conditioning on the aforementioned two information flows in an unsupervised fashion. Once training is completed, EmoAug enriches expressions of emotional speech with different prosodic attributes, such as stress, rhythm and intensity, by feeding different styles into the paralinguistic encoder. EmoAug enables us to generate similar numbers of samples for each class to tackle the data imbalance issue as well. Experimental results on the IEMOCAP dataset demonstrate that EmoAug can successfully transfer different speaking styles while retaining the speaker identity and semantic content. Furthermore, we train a SER model with data augmented by EmoAug and show that the augmented model not only surpasses the state-of-the-art supervised and self-supervised methods but also overcomes overfitting problems caused by data imbalance. Some audio samples can be found on our demo website

    Disentangling Prosody Representations with Unsupervised Speech Reconstruction

    Full text link
    Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in Automatic Speech Recognition (ASR) and speaker verification tasks respectively. However, it is still an open challenging research question to extract prosodic information because of the intrinsic association of different attributes, such as timbre and rhythm, and because of the need for supervised training schemes to achieve robust large-scale and speaker-independent ASR. The aim of this paper is to address the disentanglement of emotional prosody from speech based on unsupervised reconstruction. Specifically, we identify, design, implement and integrate three crucial components in our proposed speech reconstruction model Prosody2Vec: (1) a unit encoder that transforms speech signals into discrete units for semantic content, (2) a pretrained speaker verification model to generate speaker identity embeddings, and (3) a trainable prosody encoder to learn prosody representations. We first pretrain the Prosody2Vec representations on unlabelled emotional speech corpora, then fine-tune the model on specific datasets to perform Speech Emotion Recognition (SER) and Emotional Voice Conversion (EVC) tasks. Both objective (weighted and unweighted accuracies) and subjective (mean opinion score) evaluations on the EVC task suggest that Prosody2Vec effectively captures general prosodic features that can be smoothly transferred to other emotional speech. In addition, our SER experiments on the IEMOCAP dataset reveal that the prosody features learned by Prosody2Vec are complementary and beneficial for the performance of widely used speech pretraining models and surpass the state-of-the-art methods when combining Prosody2Vec with HuBERT representations.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

    Full text link
    3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with numerous hand-crafted components. While these methods have achieved initial success, the cascade pipeline tends to accumulate errors because of duplicated and inaccurate box estimations and messy 3D scenes. In this paper, we first propose Vote2Cap-DETR, a simple-yet-effective transformer framework that decouples the decoding process of caption generation and object localization through parallel decoding. Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture. To this end, we propose an advanced version, Vote2Cap-DETR++, which decouples the queries into localization and caption queries to capture task-specific features. Additionally, we introduce the iterative spatial refinement strategy to vote queries for faster convergence and better localization performance. We also insert additional spatial information to the caption head for more accurate descriptions. Without bells and whistles, extensive experiments on two commonly used datasets, ScanRefer and Nr3D, demonstrate Vote2Cap-DETR and Vote2Cap-DETR++ surpass conventional "detect-then-describe" methods by a large margin. Codes will be made available at https://github.com/ch3cook-fdu/Vote2Cap-DETR

    Frame Pairwise Distance Loss for Weakly-supervised Sound Event Detection

    Full text link
    Weakly-supervised learning has emerged as a promising approach to leverage limited labeled data in various domains by bridging the gap between fully supervised methods and unsupervised techniques. Acquisition of strong annotations for detecting sound events is prohibitively expensive, making weakly supervised learning a more cost-effective and broadly applicable alternative. In order to enhance the recognition rate of the learning of detection of weakly-supervised sound events, we introduce a Frame Pairwise Distance (FPD) loss branch, complemented with a minimal amount of synthesized data. The corresponding sampling and label processing strategies are also proposed. Two distinct distance metrics are employed to evaluate the proposed approach. Finally, the method is validated on the DCASE 2023 task4 dataset. The obtained experimental results corroborated the efficacy of this approach.Comment: Submitted to ICASSP 202

    Expression of catalytically active matrix metalloproteinase‐1 in dermal fibroblasts induces collagen fragmentation and functional alterations that resemble aged human skin

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/99047/1/acel12089.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/99047/2/acel12089-sup-0001-FigS1-S4.pd

    ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

    Full text link
    The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generations, versatile multimodal generative shape models can significantly benefit various fields like 3D virtual construction and network-aided design. In this work, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a word-sentence-paragraph framework to discretize continuous shapes into shape words, further assembles these words for shape sentences, as well as integrates shape with instructional text for multi-modal paragraphs. To learn this shape-language model, we use a three-stage training scheme, including shape representation, multimodal alignment, and instruction-based generation, to align shape-language codebooks and learn the intricate correlations among these modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing

    Gray matter density reduction associated with adjuvant chemotherapy in older women with breast cancer

    Get PDF
    PURPOSE: The purpose of this study was to evaluate longitudinal changes in brain gray matter density (GMD) before and after adjuvant chemotherapy in older women with breast cancer. METHODS: We recruited 16 women aged ≥ 60 years with stage I-III breast cancers receiving adjuvant chemotherapy (CT) and 15 age- and sex-matched healthy controls (HC). The CT group underwent brain MRI and the NIH Toolbox for Cognition testing prior to adjuvant chemotherapy (time point 1, TP1) and within 1 month after chemotherapy (time point 2, TP2). The HC group underwent the same assessments at matched intervals. GMD was evaluated with the voxel-based morphometry. RESULTS: The mean age was 67 years in the CT group and 68.5 years in the HC group. There was significant GMD reduction within the chemotherapy group from TP1 to TP2. Compared to the HC group, the CT group displayed statistically significantly greater GMD reductions from TP1 to TP2 in the brain regions involving the left anterior cingulate gyrus, right insula, and left middle temporal gyrus (pFWE(family-wise error)-corrected < 0.05). The baseline GMD in left insula was positively correlated with the baseline list-sorting working memory score in the HC group (pFWE-corrected < 0.05). No correlation was observed for the changes in GMD with the changes in cognitive testing scores from TP1 to TP2 (pFWE-corrected < 0.05). CONCLUSIONS: Our findings indicate that GMD reductions were associated with adjuvant chemotherapy in older women with breast cancer. Future studies are needed to understand the clinical significance of the neuroimaging findings. This study is registered on ClinicalTrials.gov (NCT01992432)

    Intrinsic brain activity changes associated with adjuvant chemotherapy in older women with breast cancer: a pilot longitudinal study

    Get PDF
    Purpose Older cancer patients are at increased risk of cancer-related cognitive impairment. The purpose of this study was to assess the alterations in intrinsic brain activity associated with adjuvant chemotherapy in older women with breast cancer. Methods Chemotherapy treatment (CT) group included sixteen women aged ≥ 60 years (range 60–82 years) with stage I-III breast cancers, who underwent both resting-state functional magnetic resonance imaging (rs-fMRI) and neuropsychological testing with NIH Toolbox for Cognition before adjuvant chemotherapy, at time point 1 (TP1), and again within 1 month after completing chemotherapy, at time point 2 (TP2). Fourteen age- and sex-matched healthy controls (HC) underwent the same assessments at matched intervals. Three voxel-wise rs-fMRI parameters: amplitude of low-frequency fluctuation (ALFF), fractional ALFF (fALFF), and regional homogeneity (ReHo), were computed at each time point. The changes in rs-fMRI parameters from TP1 to TP2 for each group, the group differences in changes (the CT group vs. the HC group), and the group difference in the baseline rs-fMRI parameters were assessed. In addition, correlative analysis between the rs-fMRI parameters and neuropsychological testing scores was also performed. Results In the CT group, one brain region, which included parts of the bilateral subcallosal gyri and right anterior cingulate gyrus, displayed increased ALFF from TP1 to TP2 (cluster p-corrected=0.024); another brain region in the left precuneus displayed decreased fALFF from TP1 to TP2 (cluster level p-corrected=0.025). No significant changes in the rs-fMRI parameters from TP1 to TP2 were observed in the HC group. Although ALFF and fALFF alterations were observed only in the CT group, none of the between-group differences in rs-fMRI parameter changes reached statistical significance. Conclusions Our study results of ALFF and fALFF alterations in the chemotherapy-treated women suggest that adjuvant chemotherapy may affect intrinsic brain activity in older women with breast cancer

    Effects of chemotherapy on aging white matter microstructure: a longitudinal diffusion tensor imaging study

    Get PDF
    Objective: We aimed to use diffusion tensor imaging (DTI) to detect alterations in white matter microstructure in older patients with breast cancer receiving chemotherapy. Methods: We recruited women age ≥60 years with stage I-III breast cancer (chemotherapy [CT] group; n = 19) to undergo two study assessments: at baseline and within one month after chemotherapy. Each assessment consisted of a brain magnetic resonance imaging scan with DTI and neuropsychological (NP) testing using the National Institutes of Health (NIH) Toolbox Cognition Battery. An age- and sex-matched group of healthy controls (HC, n = 14) underwent the same assessments at matched intervals. Four DTI parameters (fractional anisotropy [FA], mean diffusivity [MD], axial diffusivity [AD], and radial diffusivity [RD]) were calculated and correlated with NP testing scores. Results: For CT group but not HCs, we detected statistically significant increases in MD and RD in the genu of the corpus callosum from time point 1 to time point 2 at p 0.05). Conclusions: We identified alterations in white matter microstructures in older women with breast cancer undergoing chemotherapy. These findings may potentially serve as neuroimaging biomarkers for identifying cognitive impairment in older adults with cancer
    corecore