1,335 research outputs found
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification
In this paper, we focus on the semi-supervised person re-identification
(Re-ID) case, which only has the intra-camera (within-camera) labels but not
inter-camera (cross-camera) labels. In real-world applications, these
intra-camera labels can be readily captured by tracking algorithms or few
manual annotations, when compared with cross-camera labels. In this case, it is
very difficult to explore the relationships between cross-camera persons in the
training stage due to the lack of cross-camera label information. To deal with
this issue, we propose a novel Progressive Cross-camera Soft-label Learning
(PCSL) framework for the semi-supervised person Re-ID task, which can generate
cross-camera soft-labels and utilize them to optimize the network. Concretely,
we calculate an affinity matrix based on person-level features and adapt them
to produce the similarities between cross-camera persons (i.e., cross-camera
soft-labels). To exploit these soft-labels to train the network, we investigate
the weighted cross-entropy loss and the weighted triplet loss from the
classification and discrimination perspectives, respectively. Particularly, the
proposed framework alternately generates progressive cross-camera soft-labels
and gradually improves feature representations in the whole learning course.
Extensive experiments on five large-scale benchmark datasets show that PCSL
significantly outperforms the state-of-the-art unsupervised methods that employ
labeled source domains or the images generated by the GAN-based models.
Furthermore, the proposed method even has a competitive performance with
respect to deep supervised Re-ID methods.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
- …