3,505 research outputs found

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Robotic perception and control for a demolition task in unstructured environments

    Get PDF
    The construction industry is a capital-intensive sector that has steadily turned towards mechanized and automated solutions in the last few decades. However, due to some specificities of this field, it is still technologically behind other sectors, like manufacturing: there is room for improvements, that could lead to economical, technical, and also social benefits. In this work we focus on demolition robotics: taking the task of demolishing a wall as a case study (related to the needs of an industrial partner of our laboratory), we propose a mockup for studying perceptual and control aspects on a scaled-down representative scenario. The thesis deals with several aspects of the demolition task, ranging from perception, to planning, to human-robot interaction (HRI). In addition to a conceptual framework, we propose some new approaches to scene segmentation and situational awareness in unstructured environments, as well as an intuitive on-site HRI paradigm

    Exploring EEG for Object Detection and Retrieval

    Get PDF
    This paper explores the potential for using Brain Computer Interfaces (BCI) as a relevance feedback mechanism in content-based image retrieval. We investigate if it is possible to capture useful EEG signals to detect if relevant objects are present in a dataset of realistic and complex images. We perform several experiments using a rapid serial visual presentation (RSVP) of images at different rates (5Hz and 10Hz) on 8 users with different degrees of familiarization with BCI and the dataset. We then use the feedback from the BCI and mouse-based interfaces to retrieve localized objects in a subset of TRECVid images. We show that it is indeed possible to detect such objects in complex images and, also, that users with previous knowledge on the dataset or experience with the RSVP outperform others. When the users have limited time to annotate the images (100 seconds in our experiments) both interfaces are comparable in performance. Comparing our best users in a retrieval task, we found that EEG-based relevance feedback outperforms mouse-based feedback. The realistic and complex image dataset differentiates our work from previous studies on EEG for image retrieval.Comment: This preprint is the full version of a short paper accepted in the ACM International Conference on Multimedia Retrieval (ICMR) 2015 (Shanghai, China

    Open Set Logo Detection and Retrieval

    Full text link
    Current logo retrieval research focuses on closed set scenarios. We argue that the logo domain is too large for this strategy and requires an open set approach. To foster research in this direction, a large-scale logo dataset, called Logos in the Wild, is collected and released to the public. A typical open set logo retrieval application is, for example, assessing the effectiveness of advertisement in sports event broadcasts. Given a query sample in shape of a logo image, the task is to find all further occurrences of this logo in a set of images or videos. Currently, common logo retrieval approaches are unsuitable for this task because of their closed world assumption. Thus, an open set logo retrieval method is proposed in this work which allows searching for previously unseen logos by a single query sample. A two stage concept with separate logo detection and comparison is proposed where both modules are based on task specific CNNs. If trained with the Logos in the Wild data, significant performance improvements are observed, especially compared with state-of-the-art closed set approaches.Comment: accepted at VISAPP 201

    TRECVID 2003 - an overview

    Get PDF

    Component-based Attention for Large-scale Trademark Retrieval

    Full text link
    The demand for large-scale trademark retrieval (TR) systems has significantly increased to combat the rise in international trademark infringement. Unfortunately, the ranking accuracy of current approaches using either hand-crafted or pre-trained deep convolution neural network (DCNN) features is inadequate for large-scale deployments. We show in this paper that the ranking accuracy of TR systems can be significantly improved by incorporating hard and soft attention mechanisms, which direct attention to critical information such as figurative elements and reduce attention given to distracting and uninformative elements such as text and background. Our proposed approach achieves state-of-the-art results on a challenging large-scale trademark dataset.Comment: Fix typos related to authors' informatio
    corecore