3 research outputs found

    Cross-validating Image Description Datasets and Evaluation Metrics

    Get PDF
    The task of automatically generating sentential descriptions of image content has become increasingly popular in recent years, resulting in the development of large-scale image description datasets and the proposal of various metrics for evaluating image description generation systems. However, not much work has been done to analyse and understand both datasets and the metrics. In this paper, we propose using a leave-one-out cross validation (LOOCV) process as a means to analyse multiply annotated, human-authored image description datasets and the various evaluation metrics, i.e. evaluating one image description against other human-authored descriptions of the same image. Such an evaluation process affords various insights into the image description datasets and evaluation metrics, such as the variations of image descriptions within and across datasets and also what the metrics capture. We compute and analyse (i) human upper-bound performance; (ii) ranked correlation between metric pairs across datasets; (iii) lower-bound performance by comparing a set of descriptions describing one image to another sentence not describing that image. Interesting observations are made about the evaluation metrics and image description datasets, and we conclude that such cross-validation methods are extremely useful for assessing and gaining insights into image description datasets and evaluation metrics for image descriptions

    Towards Informing an Intuitive Mission Planning Interface for Autonomous Multi-Asset Teams via Image Descriptions

    Get PDF
    Establishing a basis for certification of autonomous systems using trust and trustworthiness is the focus of Autonomy Teaming and TRAjectories for Complex Trusted Operational Reliability (ATTRACTOR). The Human-Machine Interface (HMI) team is working to capture and utilize the multitude of ways in which humans are already comfortable communicating mission goals and translate that into an intuitive mission planning interface. Several input/output modalities (speech/audio, typing/text, touch, and gesture) are being considered and investigated in the context human-machine teaming for the ATTRACTOR design reference mission (DRM) of Search and Rescue or (more generally) intelligence, surveillance, and reconnaissance (ISR). The first of these investigations, the Human Informed Natural-language GANs Evaluation (HINGE) data collection effort, is aimed at building an image description database to train a Generative Adversarial Network (GAN). In addition to building an image description database, the HMI team was interested if, and how, modality (spoken vs. written) affects different aspects of the image description given. The results will be analyzed to better inform the designing of an interface for mission planning
    corecore