601 research outputs found

    REVERSE DOMAIN ADAPTATION FOR INDOOR CAMERA POSE REGRESSION

    Get PDF
    Synthetic images have been used to mitigate the scarcity of annotated data for training deep learning approaches, followed by domain adaptation that reduces the gap between synthetic and real images. One such approach is using Generative Adversarial Networks (GANs) such as CycleGAN to bridge the domain gap where the synthetic images are translated into real-looking synthetic images that are used to train the deep learning models. In this article, we explore the less intuitive alternate strategy for domain adaption in the reverse direction; i.e., real-to-synthetic adaptation. We train the deep learning models with synthetic data directly, and then during inference we apply domain adaptation to convert the real images to synthetic-looking real images using CycleGAN. This strategy reduces the amount of data conversion required during the training, can potentially generate artefact-free images compared to the harder synthetic-to-real case, and can improve the performance of deep learning models. We demonstrate the success of this strategy in indoor localisation by experimenting with camera pose regression. The experimental results indicate an improvement in localisation accuracy is observed with the proposed domain adaptation as compared to the synthetic-to-real adaptation

    An Outlook into the Future of Egocentric Vision

    Full text link
    What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.Comment: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Fine Art Pattern Extraction and Recognition

    Get PDF
    This is a reprint of articles from the Special Issue published online in the open access journal Journal of Imaging (ISSN 2313-433X) (available at: https://www.mdpi.com/journal/jimaging/special issues/faper2020)

    PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques

    Get PDF
    The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE

    Colour coded

    Get PDF
    This 300 word publication to be published by the Society of Dyers and Colourists (SDC) is a collection of the best papers from a 4-year European project that has considered colour from the perspective of both the arts and sciences.The notion of art and science and the crossovers between the two resulted in application and funding for cross disciplinary research to host a series of training events between 2006 and 2010 Marie Curie Conferences & Training Courses (SCF) Call Identifier: FP6-Mobility-4, Euros 532,363.80 CREATE – Colour Research for European Advanced Technology Employment. The research crossovers between the fields of art, science and technology was also a subject that was initiated through Bristol’s Festival if Ideas events in May 2009. The author coordinated and chaired an event during which the C.P Snow lecture “On Two Cultures’ (1959) was re-presented by Actor Simon Cook and then a lecture made by Raymond Tallis on the notion of the Polymath. The CREATE project has a worldwide impact for researchers, academics and scientists. Between January and October 2009, the site has received 221, 414 visits. The most popular route into the site is via the welcome page. The main groups of visitors originate in the UK (including Northern Ireland), Italy, France, Finland, Norway, Hungary, USA, Finland and Spain. A basic percentage breakdown of the traffic over ten months indicates: USA -15%; UK - 16%; Italy - 13%; France -12%; Hungary - 10%; Spain - 6%; Finland - 9%; Norway - 5%. The remaining approximate 14% of visitors are from other countries including Belgium, The Netherlands and Germany (approx 3%). A discussion group has been initiated by the author as part of the CREATE project to facilitate an ongoing dialogue between artists and scientists. http://createcolour.ning.com/group/artandscience www.create.uwe.ac.uk.Related papers to this research: A report on the CREATE Italian event: Colour in cultural heritage.C. Parraman, A. Rizzi, ‘Developing the CREATE network in Europe’, in Colour in Art, Design and Nature, Edinburgh, 24 October 2008.C. Parraman, “Mixing and describing colour”. CREATE (Training event 1), France, 2008

    Modeling Visual Rhetoric and Semantics in Multimedia

    Get PDF
    Recent advances in machine learning have enabled computer vision algorithms to model complicated visual phenomena with accuracies unthinkable a mere decade ago. Their high-performance on a plethora of vision-related tasks has enabled computer vision researchers to begin to move beyond traditional visual recognition problems to tasks requiring higher-level image understanding. However, most computer vision research still focuses on describing what images, text, or other media literally portrays. In contrast, in this dissertation we focus on learning how and why such content is portrayed. Rather than viewing media for its content, we recast the problem as understanding visual communication and visual rhetoric. For example, the same content may be portrayed in different ways in order to present the story the author wishes to convey. We thus seek to model not only the content of the media, but its authorial intent and latent messaging. Understanding how and why visual content is portrayed a certain way requires understanding higher level abstract semantic concepts which are themselves latent within visual media. By latent, we mean the concept is not readily visually accessible within a single image (e.g. right vs left political bias), in contrast to explicit visual semantic concepts such as objects. Specifically, we study the problems of modeling photographic style (how professional photographers portray their subjects), understanding visual persuasion in image advertisements, modeling political bias in multimedia (image and text) news articles, and learning cross-modal semantic representations. While most past research in vision and natural language processing studies the case where visual content and paired text are highly aligned (as in the case of image captions), we target the case where each modality conveys complementary information to tell a larger story. We particularly focus on the problem of learning cross-modal representations from multimedia exhibiting weak alignment between the image and text modalities. A variety of techniques are presented which improve modeling of multimedia rhetoric in real-world data and enable more robust artificially intelligent systems
    corecore