1,335 research outputs found

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly

    Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work

    Full text link
    Inspired by the fact that human brains can emphasize discriminative parts of the input and suppress irrelevant ones, substantial local mechanisms have been designed to boost the development of computer vision. They can not only focus on target parts to learn discriminative local representations, but also process information selectively to improve the efficiency. In terms of application scenarios and paradigms, local mechanisms have different characteristics. In this survey, we provide a systematic review of local mechanisms for various computer vision tasks and approaches, including fine-grained visual recognition, person re-identification, few-/zero-shot learning, multi-modal learning, self-supervised learning, Vision Transformers, and so on. Categorization of local mechanisms in each field is summarized. Then, advantages and disadvantages for every category are analyzed deeply, leaving room for exploration. Finally, future research directions about local mechanisms have also been discussed that may benefit future works. To the best our knowledge, this is the first survey about local mechanisms on computer vision. We hope that this survey can shed light on future research in the computer vision field

    Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification

    Full text link
    In this paper, we focus on the semi-supervised person re-identification (Re-ID) case, which only has the intra-camera (within-camera) labels but not inter-camera (cross-camera) labels. In real-world applications, these intra-camera labels can be readily captured by tracking algorithms or few manual annotations, when compared with cross-camera labels. In this case, it is very difficult to explore the relationships between cross-camera persons in the training stage due to the lack of cross-camera label information. To deal with this issue, we propose a novel Progressive Cross-camera Soft-label Learning (PCSL) framework for the semi-supervised person Re-ID task, which can generate cross-camera soft-labels and utilize them to optimize the network. Concretely, we calculate an affinity matrix based on person-level features and adapt them to produce the similarities between cross-camera persons (i.e., cross-camera soft-labels). To exploit these soft-labels to train the network, we investigate the weighted cross-entropy loss and the weighted triplet loss from the classification and discrimination perspectives, respectively. Particularly, the proposed framework alternately generates progressive cross-camera soft-labels and gradually improves feature representations in the whole learning course. Extensive experiments on five large-scale benchmark datasets show that PCSL significantly outperforms the state-of-the-art unsupervised methods that employ labeled source domains or the images generated by the GAN-based models. Furthermore, the proposed method even has a competitive performance with respect to deep supervised Re-ID methods.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT
    • …
    corecore