79 research outputs found

    Image Classification Using Bag-of-Visual-Words Model

    Get PDF
    Recently, with the explosive growth of digital technologies, there has been a rapid proliferation of the size of image collection. The technique of supervised image clas sification has been widely applied in many domains in order to organize, search, and retrieve images. However, the traditional feature extraction approaches yield the poor classification accuracy. Therefore, the Bag-of-visual-words model, inspired by Bag-of Words model in document classification, was used to present images with the local descriptors for image classification, and also it performs well in some fields. This research provides the empirical evidence to prove that the BoVW model outperforms the traditional feature extraction approaches for both binary image clas sification and multi-class image classification. Furthermore, the research reveals that the size of the visual vocabulary during the process of building BoVW model impact on the accuracy results of image classification

    Fairer Evaluation of Zero Shot Action Recognition in Videos

    Get PDF
    Zero-shot learning (ZSL) for human action recognition (HAR) aims to recognise video action classes that have never been seen during model training. This is achieved by building mappings between visual and semantic embeddings. These visual embeddings are typically provided via a pre-trained deep neural network (DNN). The premise of ZSL is that the training and testing classes should be disjoint. In the parallel domain of ZSL for image input, the widespread poor evaluation protocol of pre-training on ZSL test classes has been highlighted. This is akin to providing a sneak preview of the evaluation classes. In this work, we investigate the extent to which this evaluation protocol has been used in ZSL for human action recognition research work. We show that in the field of ZSL for HAR, accuracies for overlapping classes are being boosted by between 5.75% to 51.94% depending on the use of visual and semantic features as a result of this flawed evaluation protocol. To assist other research ers in avoiding this problem in the future, we provide annotated versions of the relevant benchmark ZSL test datasets in the HAR field: UCF101 and HMDB51 datasets - highlighting overlaps to pre-training datasets in the field

    Combining Text and Image Knowledge with GANs for Zero-Shot Action Recognition in Videos

    Get PDF
    The recognition of actions in videos is an active research area in machine learning, relevant to multiple domains such as health monitoring, security and social media analysis. Zero-Shot Action Recognition (ZSAR) is a challenging problem in which models are trained to identify action classes that have not been seen during the training process. According to the literature, the most promising ZSAR approaches make use of Generative Adversarial Networks (GANs). GANs can synthesise visual embeddings for unseen classes conditioned on either textual information or images related to the class labels. In this paper, we propose a Dual-GAN approach based on the VAEGAN model to prove that the fusion of visual and textual-based knowledge sources is an effective way to improve ZSAR performance. We conduct empirical ZSAR experiments of our approach on the UCF101 dataset. We apply the following embedding fusion methods for combining text-driven and image-driven information: averaging, summation, maximum, and minimum. Our best result from Dual-GAN model is achieved with the maximum embedding fusion approach that results in an average accuracy of 46.37%, which is improved by 5.37% at least compared to the leading approaches

    Enhancing Zero‑Shot Action Recognition in Videos by Combining GANs with Text and Images

    Get PDF
    Zero-shot action recognition (ZSAR) tackles the problem of recognising actions that have not been seen by the model during the training phase. Various techniques have been used to achieve ZSAR in the field of human action recognition (HAR) in videos. Techniques based on generative adversarial networks (GANs) are the most promising in terms of performance. GANs are trained to generate representations of unseen videos conditioned on information related to the unseen classes, such as class label embeddings. In this paper, we present an approach based on combining information from two different GANs, both of which generate a visual representation of unseen classes. Our dual-GAN approach leverages two separate knowledge sources related to the unseen classes: class-label texts and images related to the class label obtained from Google Images. The generated visual embeddings of the unseen classes by the two GANs are merged and used to train a classifier in a supervised-learning fashion for ZSAR classification. Our methodology is based on the idea that using more and richer knowledge sources to generate unseen classes representations will lead to higher downstream accuracy when classifying unseen classes. The experimental results show that our dual-GAN approach outperforms state-of-the-art methods on the two benchmark HAR datasets: HMDB51 and UCF101. Additionally, we present a comprehensive discussion and analysis of the experimental results for both datasets to understand the nuances of each approach at a class level. Finally, we examine the impact of the number of visual embeddings generated by the two GANs on the accuracy of the models

    Human Action Recognition in Videos Using Transfer Learning

    Get PDF
    A variety of systems focus on detecting the actions and activities performed by humans, such as video surveillance and health monitoring systems. However, published labelled human action datasets for training supervised machine learning models are limited in number and expensive to produce. The use of transfer learning for the task of action recognition can help to address this issue by transferring or re-using the knowledge of existing trained models, in combination with minimal training data from the new target domain. Our focus in this paper is an investigation of video feature representations and machine learning algorithms for transfer learning for the task of action recognition in videos in a multi-class environment. Using four labelled datasets from the human action domain, we apply two SVM-based transfer-learning algorithms: adaptive support vector machine (A-SVM) and projective model transfer SVM (PMT-SVM). For feature representations, we compare the performance of two widely used video feature representations: space-time interest points (STIP) with Histograms of Oriented Gradients (HOG) and Histograms of Optical Flow (HOF), and improved dense trajectory (iDT) to explore which feature is more suitable for action recognition from videos using transfer learning. Our results show that A-SVM and PMT-SVM can help transfer action knowledge across multiple datasets with limited labelled training data; A-SVM outperforms PMT-SVM when the target dataset is derived from realistic non-lab environments; iDT has a greater ability to perform transfer learning in action recognition

    Zero-Shot Action Recognition with Knowledge Enhanced Generative Adversarial Networks

    Get PDF
    Zero-Shot Action Recognition (ZSAR) aims to recognise action classes in videos that have never been seen during model training. In some approaches, ZSAR has been achieved by generating visual features for unseen classes based on the semantic information of the unseen class labels using generative adversarial networks (GANs). Therefore, the problem is converted to standard supervised learning since the unseen visual features are accessible. This approach alleviates the lack of labelled samples of unseen classes. In addition, objects appearing in the action instances could be used to create enriched semantics of action classes and therefore, increase the accuracy of ZSAR. In this paper, we consider using, in addition to the label, objects related to that action label. For example, the objects ‘horse’ and ‘saddle’ are highly related to the action ‘Horse Riding’ and these objects can bring additional semantic meaning. In this work, we aim to improve the GAN-based framework by incorporating object-based semantic information related to the class label with three approaches: replacing the class labels with objects, appending objects to the class, and averaging objects with the class. Then, we evaluate the performance using a subset of the popular dataset UCF101. Our experimental results demonstrate that our approach is valid since when including appropriate objects into the action classes, the baseline is improved by 4.93%

    LIGHT: Joint Individual Building Extraction and Height Estimation from Satellite Images through a Unified Multitask Learning Network

    Full text link
    Building extraction and height estimation are two important basic tasks in remote sensing image interpretation, which are widely used in urban planning, real-world 3D construction, and other fields. Most of the existing research regards the two tasks as independent studies. Therefore the height information cannot be fully used to improve the accuracy of building extraction and vice versa. In this work, we combine the individuaL buIlding extraction and heiGHt estimation through a unified multiTask learning network (LIGHT) for the first time, which simultaneously outputs a height map, bounding boxes, and a segmentation mask map of buildings. Specifically, LIGHT consists of an instance segmentation branch and a height estimation branch. In particular, so as to effectively unify multi-scale feature branches and alleviate feature spans between branches, we propose a Gated Cross Task Interaction (GCTI) module that can efficiently perform feature interaction between branches. Experiments on the DFC2023 dataset show that our LIGHT can achieve superior performance, and our GCTI module with ResNet101 as the backbone can significantly improve the performance of multitask learning by 2.8% AP50 and 6.5% delta1, respectively

    Indium-Containing Visible-Light-Driven (VLD) Photocatalysts for Solar Energy Conversion and Environment Remediation

    Get PDF
    Indium-containing visible-light-driven (VLD) photocatalysts including indium-containing oxides, indium-containing sulfides, indium-containing hydroxides, and other categories have attracted more attention due to their high catalytic activities for oxidation and reduction ability under visible light irradiation. This chapter will therefore concentrate on indium-containing nano-structured materials that demonstrate useful activity under solar excitation in fields concerned with the elimination of pollutants, partial oxidation and the vaporization of chemical compounds, water splitting, and CO2 reduction processes. The indium-containing photocatalysts can extend the light absorption range and improve the photocatalytic activity by doping, heterogeneous structures, load promoter, and morphology regulation. A number of synthetic and modification techniques for adjusting the band structure to harvest visible light and improve the charge separation in photocatalysis are discussed. In this chapter, preparation, properties, and potential applications of indium-containing nano-structured materials used as photocatalysis will be systematically summarized, which is beneficial for understanding the mechanism and developing the potential applications

    MARTE/pCCSL: Modeling and Refining Stochastic Behaviors of CPSs with Probabilistic Logical Clocks

    Get PDF
    Best Paper AwardInternational audienceCyber-Physical Systems (CPSs) are networks of heterogeneous embedded systems immersed within a physical environment. Several ad-hoc frameworks and mathematical models have been studied to deal with challenging issues raised by CPSs. In this paper, we explore a more standard-based approach that relies on SysML/MARTE to capture different aspects of CPSs, including structure, behaviors, clock constraints, and non-functional properties. The novelty of our work lies in the use of logical clocks and MARTE/CCSL to drive and coordinate different models. Meanwhile, to capture stochastic behaviors of CPSs, we propose an extension of CCSL, called pCCSL, where logical clocks are adorned with stochastic properties. Possible variants are explored using Statistical Model Checking (SMC) via a transformation from the MARTE/pCCSL models into Stochastic Hybrid Automata. The whole process is illustrated through a case study of energy-aware building, in which the system is modeled by SysML/MARTE/pCCSL and different variants are explored through SMC to help expose the best alternative solutions
    corecore