26 research outputs found

    Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

    Full text link
    Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared for transformer models and video temporal dynamics, motivated by the growing interest in basic scientific understanding, model diagnostics and societal implications of real-world deployment. Previous surveys mainly focused on ConvNet models on a subset of video segmentation tasks or transformers for classification tasks. Moreover, component-wise discussion of transformer-based video segmentation models has not yet received due focus. In addition, previous reviews of interpretability methods focused on transformers for classification, while analysis of video temporal dynamics modelling capabilities of video models received less attention. In this survey, we address the above with a thorough discussion of various categories of video segmentation, a component-wise discussion of the state-of-the-art transformer-based models, and a review of related interpretability methods. We first present an introduction to the different video segmentation task categories, their objectives, specific challenges and benchmark datasets. Next, we provide a component-wise review of recent transformer-based models and document the state of the art on different video segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc interpretability methods for transformer models and interpretability methods for understanding the role of the temporal dimension in video models. Finally, we conclude our discussion with future research directions

    Gaining Insight into Determinants of Physical Activity using Bayesian Network Learning

    Get PDF
    Contains fulltext : 228326pre.pdf (preprint version ) (Open Access) Contains fulltext : 228326pub.pdf (publisher's version ) (Open Access)BNAIC/BeneLearn 202

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∌ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    IberSPEECH 2020: XI Jornadas en TecnologĂ­a del Habla and VII Iberian SLTech

    Get PDF
    IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de TecnologĂ­as del Habla. Universidad de Valladoli

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    The blessings of explainable AI in operations & maintenance of wind turbines

    Get PDF
    Wind turbines play an integral role in generating clean energy, but regularly suffer from operational inconsistencies and failures leading to unexpected downtimes and significant Operations & Maintenance (O&M) costs. Condition-Based Monitoring (CBM) has been utilised in the past to monitor operational inconsistencies in turbines by applying signal processing techniques to vibration data. The last decade has witnessed growing interest in leveraging Supervisory Control & Acquisition (SCADA) data from turbine sensors towards CBM. Machine Learning (ML) techniques have been utilised to predict incipient faults in turbines and forecast vital operational parameters with high accuracy by leveraging SCADA data and alarm logs. More recently, Deep Learning (DL) methods have outperformed conventional ML techniques, particularly for anomaly prediction. Despite demonstrating immense promise in transitioning to Artificial Intelligence (AI), such models are generally black-boxes that cannot provide rationales behind their predictions, hampering the ability of turbine operators to rely on automated decision making. We aim to help combat this challenge by providing a novel perspective on Explainable AI (XAI) for trustworthy decision support.This thesis revolves around three key strands of XAI – DL, Natural Language Generation (NLG) and Knowledge Graphs (KGs), which are investigated by utilising data from an operational turbine. We leverage DL and NLG to predict incipient faults and alarm events in the turbine in natural language as well as generate human-intelligible O&M strategies to assist engineers in fixing/averting the faults. We also propose specialised DL models which can predict causal relationships in SCADA features as well as quantify the importance of vital parameters leading to failures. The thesis finally culminates with an interactive Question- Answering (QA) system for automated reasoning that leverages multimodal domain-specific information from a KG, facilitating engineers to retrieve O&M strategies with natural language questions. By helping make turbines more reliable, we envisage wider adoption of wind energy sources towards tackling climate change
    corecore