108 research outputs found
Embracing Goudiao 句鑃 and the Land of Wuyue 吴越: Cultural Voice and Historical Connections in Contemporary Music Composition
This thesis examines the complex relationship between cultural heritage, land, and historical instruments in contemporary music composition, with a focus on the Wuyue 吴越 region in southeastern China. The study explores the potential of the goudiao 句鑃, a historical bronze struck idiophone associated with Wuyue, for creating new music that reflects the cultural voice and historical connections of the area. As a native of Wuyue, the author also actively incorporates personal cultural memory and engages with Wuyue's traditional musical languages to create a collection of new music that resonates with her as a cultural bearer.
The thesis employs a combination of theoretical research, fieldwork, spectral analysis, performance, and compositional practice to investigate how the goudiao can function as both a symbol representing Wuyue's cultural identity and a source of inspiration for contemporary compositions. It also considers the roles played by emotional geography and the concept of "everywhen" in shaping the nuanced musical expressions arising from cultural heritage.
From a musicological standpoint, this research contributes to the realm of goudiao research and highlights the intrinsic value of cultural heritage, history, and geographic perspectives in understanding the creative processes underlying a nuanced interpretation of musical traditions. Compositionally, the thesis establishes a model for integrating cultural heritage into contemporary compositions and illuminates the possibilities of engaging with historical instruments. Additionally, the study addresses a gap in previous studies, characterised by a lack of compositions by Wuyue composers drawing inspiration from the historical discoveries associated with Wuyue culture
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
When applying a pre-trained 2D-to-3D human pose lifting model to a target
unseen dataset, large performance degradation is commonly encountered due to
domain shift issues. We observe that the degradation is caused by two factors:
1) the large distribution gap over global positions of poses between the source
and target datasets due to variant camera parameters and settings, and 2) the
deficient diversity of local structures of poses in training. To this end, we
combine \textbf{global adaptation} and \textbf{local generalization} in
\textit{PoseDA}, a simple yet effective framework of unsupervised domain
adaptation for 3D human pose estimation. Specifically, global adaptation aims
to align global positions of poses from the source domain to the target domain
with a proposed global position alignment (GPA) module. And local
generalization is designed to enhance the diversity of 2D-3D pose mapping with
a local pose augmentation (LPA) module. These modules bring significant
performance improvement without introducing additional learnable parameters. In
addition, we propose local pose augmentation (LPA) to enhance the diversity of
3D poses following an adversarial training scheme consisting of 1) a
augmentation generator that generates the parameters of pre-defined pose
transformations and 2) an anchor discriminator to ensure the reality and
quality of the augmented data. Our approach can be applicable to almost all
2D-3D lifting models. \textit{PoseDA} achieves 61.3 mm of MPJPE on MPI-INF-3DHP
under a cross-dataset evaluation setup, improving upon the previous
state-of-the-art method by 10.2\%
UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning
In recent times, there has been a growing interest in developing effective
perception techniques for combining information from multiple modalities. This
involves aligning features obtained from diverse sources to enable more
efficient training with larger datasets and constraints, as well as leveraging
the wealth of information contained in each modality. 2D and 3D Human Pose
Estimation (HPE) are two critical perceptual tasks in computer vision, which
have numerous downstream applications, such as Action Recognition,
Human-Computer Interaction, Object tracking, etc. Yet, there are limited
instances where the correlation between Image and 2D/3D human pose has been
clearly researched using a contrastive paradigm. In this paper, we propose
UniHPE, a unified Human Pose Estimation pipeline, which aligns features from
all three modalities, i.e., 2D human pose estimation, lifting-based and
image-based 3D human pose estimation, in the same pipeline. To align more than
two modalities at the same time, we propose a novel singular value based
contrastive learning loss, which better aligns different modalities and further
boosts the performance. In our evaluation, UniHPE achieves remarkable
performance metrics: MPJPE mm on the Human3.6M dataset and PAMPJPE
mm on the 3DPW dataset. Our proposed method holds immense potential to
advance the field of computer vision and contribute to various applications
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
Learning-based methods have dominated the 3D human pose estimation (HPE)
tasks with significantly better performance in most benchmarks than traditional
optimization-based methods. Nonetheless, 3D HPE in the wild is still the
biggest challenge of learning-based models, whether with 2D-3D lifting,
image-to-3D, or diffusion-based methods, since the trained networks implicitly
learn camera intrinsic parameters and domain-based 3D human pose distributions
and estimate poses by statistical average. On the other hand, the
optimization-based methods estimate results case-by-case, which can predict
more diverse and sophisticated human poses in the wild. By combining the
advantages of optimization-based and learning-based methods, we propose the
Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the
problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO
achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE mm
without training with any 2D-3D or image-3D pairs. Moreover, our
single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE
mm on cross-dataset evaluation, which even outperforms learning-based
methods trained on 3DPW
A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video
Dense object counting or crowd counting has come a long way thanks to the
recent development in the vision community. However, indiscernible object
counting, which aims to count the number of targets that are blended with
respect to their surroundings, has been a challenge. Image-based object
counting datasets have been the mainstream of the current publicly available
datasets. Therefore, we propose a large-scale dataset called YoutubeFish-35,
which contains a total of 35 sequences of high-definition videos with high
frame-per-second and more than 150,000 annotated center points across a
selected variety of scenes. For benchmarking purposes, we select three
mainstream methods for dense object counting and carefully evaluate them on the
newly collected dataset. We propose TransVidCount, a new strong baseline that
combines density and regression branches along the temporal domain in a unified
framework and can effectively tackle indiscernible object counting with
state-of-the-art performance on YoutubeFish-35 dataset.Comment: Accepted by ICASSP 2024 (IEEE International Conference on Acoustics,
Speech, and Signal Processing
- …