168 research outputs found
An interpretability framework for Similar case matching
Similar Case Matching (SCM) plays a pivotal role in the legal system by
facilitating the efficient identification of similar cases for legal
professionals. While previous research has primarily concentrated on enhancing
the performance of SCM models, the aspect of interpretability has been
neglected. To bridge the gap, this study proposes an integrated pipeline
framework for interpretable SCM. The framework comprises four modules: judicial
feature sentence identification, case matching, feature sentence alignment, and
conflict resolution. In contrast to current SCM methods, our framework first
extracts feature sentences within a legal case that contain essential
information. Then it conducts case matching based on these extracted features.
Subsequently, our framework aligns the corresponding sentences in two legal
cases to provide evidence of similarity. In instances where the results of case
matching and feature sentence alignment exhibit conflicts, the conflict
resolution module resolves these inconsistencies. The experimental results show
the effectiveness of our proposed framework, establishing a new benchmark for
interpretable SCM
Adaptive Communications in Collaborative Perception with Domain Alignment for Autonomous Driving
Collaborative perception among multiple connected and autonomous vehicles can
greatly enhance perceptive capabilities by allowing vehicles to exchange
supplementary information via communications. Despite advances in previous
approaches, challenges still remain due to channel variations and data
heterogeneity among collaborative vehicles. To address these issues, we propose
ACC-DA, a channel-aware collaborative perception framework to dynamically
adjust the communication graph and minimize the average transmission delay
while mitigating the side effects from the data heterogeneity. Our novelties
lie in three aspects. We first design a transmission delay minimization method,
which can construct the communication graph and minimize the transmission delay
according to different channel information state. We then propose an adaptive
data reconstruction mechanism, which can dynamically adjust the rate-distortion
trade-off to enhance perception efficiency. Moreover, it minimizes the temporal
redundancy during data transmissions. Finally, we conceive a domain alignment
scheme to align the data distribution from different vehicles, which can
mitigate the domain gap between different vehicles and improve the performance
of the target task. Comprehensive experiments demonstrate the effectiveness of
our method in comparison to the existing state-of-the-art works.Comment: 6 pages, 6 figure
FSD: An Initial Chinese Dataset for Fake Song Detection
Singing voice synthesis and singing voice conversion have significantly
advanced, revolutionizing musical experiences. However, the rise of "Deepfake
Songs" generated by these technologies raises concerns about authenticity.
Unlike Audio DeepFake Detection (ADD), the field of song deepfake detection
lacks specialized datasets or methods for song authenticity verification. In
this paper, we initially construct a Chinese Fake Song Detection (FSD) dataset
to investigate the field of song deepfake detection. The fake songs in the FSD
dataset are generated by five state-of-the-art singing voice synthesis and
singing voice conversion methods. Our initial experiments on FSD revealed the
ineffectiveness of existing speech-trained ADD models for the task of song
deepFake detection. Thus, we employ the FSD dataset for the training of ADD
models. We subsequently evaluate these models under two scenarios: one with the
original songs and another with separated vocal tracks. Experiment results show
that song-trained ADD models exhibit a 38.58% reduction in average equal error
rate compared to speech-trained ADD models on the FSD test set.Comment: Submitted to ICASSP 202
ReliTalk: Relightable Talking Portrait Generation from a Single Video
Recent years have witnessed great progress in creating vivid audio-driven
portraits from monocular videos. However, how to seamlessly adapt the created
video avatars to other scenarios with different backgrounds and lighting
conditions remains unsolved. On the other hand, existing relighting studies
mostly rely on dynamically lighted or multi-view data, which are too expensive
for creating video portraits. To bridge this gap, we propose ReliTalk, a novel
framework for relightable audio-driven talking portrait generation from
monocular videos. Our key insight is to decompose the portrait's reflectance
from implicitly learned audio-driven facial normals and images. Specifically,
we involve 3D facial priors derived from audio features to predict delicate
normal maps through implicit functions. These initially predicted normals then
take a crucial part in reflectance decomposition by dynamically estimating the
lighting condition of the given video. Moreover, the stereoscopic face
representation is refined using the identity-consistent loss under simulated
multiple lighting conditions, addressing the ill-posed problem caused by
limited views available from a single monocular video. Extensive experiments
validate the superiority of our proposed framework on both real and synthetic
datasets. Our code is released in https://github.com/arthur-qiu/ReliTalk
- …