108 research outputs found

    Embracing Goudiao 句鑃 and the Land of Wuyue 吴越: Cultural Voice and Historical Connections in Contemporary Music Composition

    Get PDF
    This thesis examines the complex relationship between cultural heritage, land, and historical instruments in contemporary music composition, with a focus on the Wuyue 吴越 region in southeastern China. The study explores the potential of the goudiao 句鑃, a historical bronze struck idiophone associated with Wuyue, for creating new music that reflects the cultural voice and historical connections of the area. As a native of Wuyue, the author also actively incorporates personal cultural memory and engages with Wuyue's traditional musical languages to create a collection of new music that resonates with her as a cultural bearer. The thesis employs a combination of theoretical research, fieldwork, spectral analysis, performance, and compositional practice to investigate how the goudiao can function as both a symbol representing Wuyue's cultural identity and a source of inspiration for contemporary compositions. It also considers the roles played by emotional geography and the concept of "everywhen" in shaping the nuanced musical expressions arising from cultural heritage. From a musicological standpoint, this research contributes to the realm of goudiao research and highlights the intrinsic value of cultural heritage, history, and geographic perspectives in understanding the creative processes underlying a nuanced interpretation of musical traditions. Compositionally, the thesis establishes a model for integrating cultural heritage into contemporary compositions and illuminates the possibilities of engaging with historical instruments. Additionally, the study addresses a gap in previous studies, characterised by a lack of compositions by Wuyue composers drawing inspiration from the historical discoveries associated with Wuyue culture

    Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

    Full text link
    When applying a pre-trained 2D-to-3D human pose lifting model to a target unseen dataset, large performance degradation is commonly encountered due to domain shift issues. We observe that the degradation is caused by two factors: 1) the large distribution gap over global positions of poses between the source and target datasets due to variant camera parameters and settings, and 2) the deficient diversity of local structures of poses in training. To this end, we combine \textbf{global adaptation} and \textbf{local generalization} in \textit{PoseDA}, a simple yet effective framework of unsupervised domain adaptation for 3D human pose estimation. Specifically, global adaptation aims to align global positions of poses from the source domain to the target domain with a proposed global position alignment (GPA) module. And local generalization is designed to enhance the diversity of 2D-3D pose mapping with a local pose augmentation (LPA) module. These modules bring significant performance improvement without introducing additional learnable parameters. In addition, we propose local pose augmentation (LPA) to enhance the diversity of 3D poses following an adversarial training scheme consisting of 1) a augmentation generator that generates the parameters of pre-defined pose transformations and 2) an anchor discriminator to ensure the reality and quality of the augmented data. Our approach can be applicable to almost all 2D-3D lifting models. \textit{PoseDA} achieves 61.3 mm of MPJPE on MPI-INF-3DHP under a cross-dataset evaluation setup, improving upon the previous state-of-the-art method by 10.2\%

    UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

    Full text link
    In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE 50.550.5mm on the Human3.6M dataset and PAMPJPE 51.651.6mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications

    Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

    Full text link
    Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge of learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE 51.451.4mm without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 42.642.6mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW

    A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video

    Full text link
    Dense object counting or crowd counting has come a long way thanks to the recent development in the vision community. However, indiscernible object counting, which aims to count the number of targets that are blended with respect to their surroundings, has been a challenge. Image-based object counting datasets have been the mainstream of the current publicly available datasets. Therefore, we propose a large-scale dataset called YoutubeFish-35, which contains a total of 35 sequences of high-definition videos with high frame-per-second and more than 150,000 annotated center points across a selected variety of scenes. For benchmarking purposes, we select three mainstream methods for dense object counting and carefully evaluate them on the newly collected dataset. We propose TransVidCount, a new strong baseline that combines density and regression branches along the temporal domain in a unified framework and can effectively tackle indiscernible object counting with state-of-the-art performance on YoutubeFish-35 dataset.Comment: Accepted by ICASSP 2024 (IEEE International Conference on Acoustics, Speech, and Signal Processing
    corecore