24 research outputs found

    Zero-Shot Action Recognition with Knowledge Enhanced Generative Adversarial Networks

    Get PDF
    Zero-Shot Action Recognition (ZSAR) aims to recognise action classes in videos that have never been seen during model training. In some approaches, ZSAR has been achieved by generating visual features for unseen classes based on the semantic information of the unseen class labels using generative adversarial networks (GANs). Therefore, the problem is converted to standard supervised learning since the unseen visual features are accessible. This approach alleviates the lack of labelled samples of unseen classes. In addition, objects appearing in the action instances could be used to create enriched semantics of action classes and therefore, increase the accuracy of ZSAR. In this paper, we consider using, in addition to the label, objects related to that action label. For example, the objects ‘horse’ and ‘saddle’ are highly related to the action ‘Horse Riding’ and these objects can bring additional semantic meaning. In this work, we aim to improve the GAN-based framework by incorporating object-based semantic information related to the class label with three approaches: replacing the class labels with objects, appending objects to the class, and averaging objects with the class. Then, we evaluate the performance using a subset of the popular dataset UCF101. Our experimental results demonstrate that our approach is valid since when including appropriate objects into the action classes, the baseline is improved by 4.93%

    Combining Text and Image Knowledge with GANs for Zero-Shot Action Recognition in Videos

    Get PDF
    The recognition of actions in videos is an active research area in machine learning, relevant to multiple domains such as health monitoring, security and social media analysis. Zero-Shot Action Recognition (ZSAR) is a challenging problem in which models are trained to identify action classes that have not been seen during the training process. According to the literature, the most promising ZSAR approaches make use of Generative Adversarial Networks (GANs). GANs can synthesise visual embeddings for unseen classes conditioned on either textual information or images related to the class labels. In this paper, we propose a Dual-GAN approach based on the VAEGAN model to prove that the fusion of visual and textual-based knowledge sources is an effective way to improve ZSAR performance. We conduct empirical ZSAR experiments of our approach on the UCF101 dataset. We apply the following embedding fusion methods for combining text-driven and image-driven information: averaging, summation, maximum, and minimum. Our best result from Dual-GAN model is achieved with the maximum embedding fusion approach that results in an average accuracy of 46.37%, which is improved by 5.37% at least compared to the leading approaches

    Enhancing Zero‑Shot Action Recognition in Videos by Combining GANs with Text and Images

    Get PDF
    Zero-shot action recognition (ZSAR) tackles the problem of recognising actions that have not been seen by the model during the training phase. Various techniques have been used to achieve ZSAR in the field of human action recognition (HAR) in videos. Techniques based on generative adversarial networks (GANs) are the most promising in terms of performance. GANs are trained to generate representations of unseen videos conditioned on information related to the unseen classes, such as class label embeddings. In this paper, we present an approach based on combining information from two different GANs, both of which generate a visual representation of unseen classes. Our dual-GAN approach leverages two separate knowledge sources related to the unseen classes: class-label texts and images related to the class label obtained from Google Images. The generated visual embeddings of the unseen classes by the two GANs are merged and used to train a classifier in a supervised-learning fashion for ZSAR classification. Our methodology is based on the idea that using more and richer knowledge sources to generate unseen classes representations will lead to higher downstream accuracy when classifying unseen classes. The experimental results show that our dual-GAN approach outperforms state-of-the-art methods on the two benchmark HAR datasets: HMDB51 and UCF101. Additionally, we present a comprehensive discussion and analysis of the experimental results for both datasets to understand the nuances of each approach at a class level. Finally, we examine the impact of the number of visual embeddings generated by the two GANs on the accuracy of the models

    Detecting Road Intersections from Satellite Images using Convolutional Neural Networks

    Get PDF
    Automatic detection of road intersections is an important task in various domains such as navigation, route planning, traffic prediction, and road network extraction. Road intersections range from simple three-way T-junctions to complex large-scale junctions with many branches. The location of intersections is an important consideration for vulnerable road users such as People with Blindness or Visually Impairment (PBVI) or children. Route planning applications, however, do not give information about the location of intersections as this information is not available at scale. As a first step to solving this problem, a mechanism for automatically mapping road intersection locations is required, ideally using a globally available data source

    Real-time bidding campaigns optimization using user profile settings

    Get PDF
    Real-time bidding is nowadays one of the most promising systems in the online advertising ecosystem. In this study, the performance of RTB campaigns is improved by optimising the parameters of the users\u27 profiles and the publishers\u27 websites. Most studies concerning optimising RTB campaigns are focused on the bidding strategy, i.e., estimating the best value for each bid. However, this research focuses on optimising RTB campaigns by finding out configurations that maximise both the number of impressions and the average profitability of the visits. An online campaign configuration generally consists of a set of parameters along with their values such as {Browser = Chrome , Country = Germany , Age = 20–40 and Gender = Woman }. The experiments show that when advertisers\u27 required visits are low, it is easy to find configurations with high average profitability. Still, as the required number of visits increases, the average profitability diminishes. Additionally, configuration optimisation has been combined with other interesting strategies to increase, even more, the campaigns\u27 profitability. In particular, the presented study considers the following complementary strategies to increase profitability: (1) selecting multiple configurations with a small number of visits rather than a unique configuration with a large number of visits, (2) discarding visits according to certain cost and profitability thresholds, (3) analysing a reduced space of the dataset and extrapolating the solution over the whole dataset, and (4) increasing the search space by including solutions below the required number of visits. RTB and other advertising platforms could offer advertisers the developed campaign optimisation methodology to make their campaigns more profitable

    Forecasting COVID-19 Cases Using Dynamic Time Warping and Incremental Machine Learning Methods

    Get PDF
    The investment of time and resources for developing better strategies is key to dealing with future pandemics. In this work, we recreated the situation of COVID-19 across the year 2020, when the pandemic started spreading worldwide. We conducted experiments to predict the coronavirus cases for the 50 countries with the most cases during 2020. We compared the performance of state-of-the-art machine learning algorithms, such as long-short-term memory networks, against that of online incremental machine learning algorithms. To find the best strategy, we performed experiments to test three different approaches. In the first approach (single-country), we trained each model using data only from the country we were predicting. In the second one (multiple-country), we trained a model using the data from the 50 countries, and we used that model to predict each of the 50 countries. In the third experiment, we first applied clustering to calculate the nine most similar countries to the country that we were predicting. We consider two countries to be similar if the differences between the curve that represents the COVID-19 time series are small. To do so, we used time series similarity measures (TSSM) such as Euclidean Distance (ED) and Dynamic Time Warping (DTW). TSSM return a real value that represents the distance between the points in two time series which can be interpreted as how similar they are. Then, we trained the models with the data from the nine more similar countries to the one that was predicted and the predicted one. We used the model ARIMA as a baseline for our results. Results show that the idea of using TSSM is a very effective approach. By using it with the ED, the obtained RMSE in the singlecountry and multiple-country approaches was reduced by 74.21% and 74.70%, respectively. And by using the DTW, the RMSE was reduced by 74.89% and 75.36%. The main advantage of our methodology is that it is very simple and fast to apply since it is only based on time series data, as opposed to more complex methodologies that require a deep and thorough study to consider the number of parameters involved in the spread of the virus and their corresponding values. We made our code public to allow other researchers to explore our proposed methodology

    Detecting Road Intersections from Satellite Images using Convolutinal Neural Networks

    Get PDF
    The location of intersections is an important consideration for vulnerable road users such as People with Blindness or Visually Impairment (PBVI) or children. Route planning applications, however, do not give information about the location of intersections as this information is not available at scale. In this paper, we propose a deep learning framework to automatically detect the location of intersections from satellite images using convolutional neural networks. For this purpose, we labelled 7,342 Google maps images from Washington, DC, USA to create a dataset. This dataset covers a region of 58.98 km2^{2} and has 7,548 intersections. We then applied a recent object detection model (EfficientDet) to detect the location of intersections. Experiments based on the road network in Washington, DC, show that the accuracy of our model is within 5 meters for 88.6\% of the predicted intersections. Most of our predicted centres of the intersections (approx 80\%) are within 2 metres of the ground truth centre. Using hybrid images, we obtained an average recall and an average precision of 76.5\% and 82.8\% respectively, computed for values of Intersection Over Union (IOU) from 0.5 to 0.95, step 0.05. We have published an automation script to enable the reproduction of our dataset for other researchers.https://arrow.tudublin.ie/cddpos/1008/thumbnail.jp

    ZeChipC: Time Series Interpolation Method Based on Lebesgue Sampling

    Get PDF
    In this paper, we present an interpolation method based on Lebesgue sampling that could help to develop systems based time series more efficiently. Our methods can transmit times series, frequently used in health monitoring, with the same level of accuracy but using much fewer data. Our method is based in Lebesgue sampling, which collects information depending on the values of the signal (e.g. the signal output is sampled when it crosses specific limits). Lebesgue sampling contains additional information about the shape of the signal in-between two sampled points. Using this information would allow generating an interpolated signal closer to the original one. In our contribution, we propose a novel time-series interpolation method designed explicitly for Lebesgue sampling called ZeChipC. ZeChipC is a combination of Zero-order hold and Piecewise Cubic Hermite Interpolating Polynomial(PCHIP) interpolation. ZeChipC includes new functionality to adapt the reconstructed signal to concave/convex regions. The proposed methods have been compared with state-of-the-art interpolation methods using Lebesgue sampling and have offered higher average performance

    Multivariate feature ranking of gene expression data

    Full text link
    Gene expression datasets are usually of high dimensionality and therefore require efficient and effective methods for identifying the relative importance of their attributes. Due to the huge size of the search space of the possible solutions, the attribute subset evaluation feature selection methods tend to be not applicable, so in these scenarios feature ranking methods are used. Most of the feature ranking methods described in the literature are univariate methods, so they do not detect interactions between factors. In this paper we propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency, which we have applied in three gene expression classification problems. We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance, as well as feature selection methods of attribute subset evaluation based on correlation and consistency with multi-objective evolutionary search strategy
    corecore