616 research outputs found

    Listen, Look, and Gotcha: Instant Video Search with Mobile Phones by Layered Audio-Video Indexing *

    Get PDF
    ABSTRACT Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video content while on the move. In this paper, we have developed an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. The system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as extract light-weight joint audio-video signatures in real time and perform progressive search on mobile devices. Unlike most existing mobile video search applications that simply send the original video query to the cloud, the proposed mobile system is one of the first attempts at instant and progressive video search leveraging the light-weight computing capacity of mobile devices. The system is characterized by four unique properties: 1) a joint audio-video signature to deal with the large aural and visual variances associated with the query video captured by the mobile phone, 2) layered audio-video indexing to holistically exploit the complementary nature of audio and video signals, 3) light-weight fingerprinting to comply with mobile processing capacity, and 4) a progressive query process to significantly reduce computational costs and improve the user experience-the search process can stop anytime once a confident result is achieved. We have collected 1,400 query videos captured by 25 mobile users from a dataset of 600 hours of video. The experiments show that our system outperforms state-of-the-art methods by achieving 90.79% precision when the query video is less than 10 seconds and 70.07% even when the query video is less than 5 seconds. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. The search process can stop anytime once a confident search result is achieved. Thus, the user does not need to wait for a fixed time lag. The proposed system is characterized by its unique features such as layered audio-video indexing, as well as instant and progressive search. Categories and Subject Descriptor

    High Dynamic Range Images Coding: Embedded and Multiple Description

    Get PDF
    The aim of this work is to highlight and discuss a new paradigm for representing high-dynamic range (HDR) images that can be used for both its coding and describing its multimedia content. In particular, the new approach defines a new representation domain that, conversely from the classical compressed one, enables to identify and exploit content metadata. Information related to content are used here to control both the encoding and the decoding process and are directly embedded in the compressed data stream. Firstly, thanks to the proposed solution, the content description can be quickly accessed without the need of fully decoding the compressed stream. This fact ensures a significant improvement in the performance of search and retrieval systems, such as for semantic browsing of image databases. Then, other potential benefits can be envisaged especially in the field of management and distribution of multimedia content, because the direct embedding of content metadata preserves the consistency between content stream and content description without the need of other external frameworks, such as MPEG-21. The paradigm proposed here may also be shifted to Multiple description coding, where different representations of the HDR image can be generated accordingly to its content. The advantages provided by the new proposed method are visible at different levels, i.e. when evaluating the redundancy reduction. Moreover, the descriptors extracted from the compressed data stream could be actively used in complex applications, such as fast retrieval of similar images from huge databases

    Big data analytics and processing for urban surveillance systems

    Get PDF
    Urban surveillance systems will be more demanding in the future towards smart city to improve the intelligence of cities. Big data analytics and processing for urban surveillance systems become increasingly important research areas because of infinite generation of massive data volumes all over the world. This thesis focused on solving several challenging big data issues in urban surveillance systems. First, we proposed several simple yet efficient video data recoding algorithms to be used in urban surveillance systems. The key idea is to record the important video frames when cutting the number of unimportant video frames. Second, since the DCT based JPEG standard encounters problems such as block artifacts, we proposed a very simple but effective method which results in better quality than widely used filters while consuming much less computer CPU resources. Third, we designed a novel filter to detect either the vehicle license plates or the vehicles from the images captured by the digital camera imaging sensors. We are the first to design this kind of filter to detect the vehicle/license plate objects. Fourth, we proposed novel grate filter to identify whether there are objects in these images captured by the cameras. In this way the background images can be updated from time to time when no object is detected. Finally, we combined image hash with our novel density scan method to solve the problem of retrieving similar duplicate images

    matching of prior textures by image compression for geological mapping and novelty detection

    Get PDF
    We describe an image-comparison technique of Heidemann and Ritter (2008a, b), which uses image compression, and is capable of: (i) detecting novel textures in a series of images, as well as of: (ii) alerting the user to the similarity of a new image to a previously observed texture. This image-comparison technique has been implemented and tested using our Astrobiology Phone-cam system, which employs Bluetooth communication to send images to a local laptop server in the field for the image-compression analysis. We tested the system in a field site displaying a heterogeneous suite of sandstones, limestones, mudstones and coal beds. Some of the rocks are partly covered with lichen. The image-matching procedure of this system performed very well with data obtained through our field test, grouping all images of yellow lichens together and grouping all images of a coal bed together, and giving 91% accuracy for similarity detection. Such similarity detection could be employed to make maps of different geological units. The novelty-detection performance of our system was also rather good (64% accuracy). Such novelty detection may become valuable in searching for new geological units, which could be of astrobiological interest. The current system is not directly intended for mapping and novelty detection of a second field site based on image- compression analysis of an image database from a first field site, although our current system could be further developed towards this end. Furthermore, the image-comparison technique is an unsupervised technique that is not capable of directly classifying an image as containing a particular geological feature; labelling of such geological features is done post facto by human geologists associated with this study, for the purpose of analysing the system's performance. By providing more advanced capabilities for similarity detection and novelty detection, this image-compression technique could be useful in giving more scientific autonomy to robotic planetary rovers, and in assisting human astronauts in their geological exploration and assessment

    Beyond the pixels: learning and utilising video compression features for localisation of digital tampering.

    Get PDF
    Video compression is pervasive in digital society. With rising usage of deep convolutional neural networks (CNNs) in the fields of computer vision, video analysis and video tampering detection, it is important to investigate how patterns invisible to human eyes may be influencing modern computer vision techniques and how they can be used advantageously. This work thoroughly explores how video compression influences accuracy of CNNs and shows how optimal performance is achieved when compression levels in the training set closely match those of the test set. A novel method is then developed, using CNNs, to derive compression features directly from the pixels of video frames. It is then shown that these features can be readily used to detect inauthentic video content with good accuracy across multiple different video tampering techniques. Moreover, the ability to explain these features allows predictions to be made about their effectiveness against future tampering methods. The problem is motivated with a novel investigation into recent video manipulation methods, which shows that there is a consistent drive to produce convincing, photorealistic, manipulated or synthetic video. Humans, blind to the presence of video tampering, are also blind to the type of tampering. New detection techniques are required and, in order to compensate for human limitations, they should be broadly applicable to multiple tampering types. This thesis details the steps necessary to develop and evaluate such techniques

    Recent advances in deep learning for object detection

    Get PDF
    Object detection is a fundamental visual recognition problem in computer vision and has been widely studied in the past decades. Visual object detection aims to find objects of certain target classes with precise localization in a given image and assign each object instance a corresponding class label. Due to the tremendous successes of deep learning based image classification, object detection techniques using deep learning have been actively studied in recent years. In this paper, we give a comprehensive survey of recent advances in visual object detection with deep learning. By reviewing a large body of recent related work in literature, we systematically analyze the existing object detection frameworks and organize the survey into three major parts: (i) detection components, (ii) learning strategies, and (iii) applications & benchmarks. In the survey, we cover a variety of factors affecting the detection performance in detail, such as detector architectures, feature learning, proposal generation, sampling strategies, etc. Finally, we discuss several future directions to facilitate and spur future research for visual object detection with deep learning. Keywords: Object Detection, Deep Learning, Deep Convolutional Neural Network

    Applications of deep learning in fish habitat monitoring: A tutorial and survey

    Get PDF
    Marine ecosystems and their fish habitats are becoming increasingly important due to their integral role in providing a valuable food source and conservation outcomes. Due to their remote and difficult to access nature, marine environments and fish habitats are often monitored using underwater cameras to record videos and images for understanding fish life and ecology, as well as for preserve the environment. There are currently many permanent underwater camera systems deployed at different places around the globe. In addition, there exists numerous studies that use temporary cameras to survey fish habitats. These cameras generate a massive volume of digital data, which cannot be efficiently analysed by current manual processing methods, which involve a human observer. Deep Learning (DL) is a cutting-edge Artificial Intelligence (AI) technology that has demonstrated unprecedented performance in analysing visual data. Despite its application to a myriad of domains, its use in underwater fish habitat monitoring remains under explored. In this paper, we provide a tutorial that covers the key concepts of DL, which help the reader grasp a high-level understanding of how DL works. The tutorial also explains a step-by-step procedure on how DL algorithms should be developed for challenging applications such as underwater fish monitoring. In addition, we provide a comprehensive survey of key deep learning techniques for fish habitat monitoring including classification, counting, localisation, and segmentation. Furthermore, we survey publicly available underwater fish datasets, and compare various DL techniques in the underwater fish monitoring domains. We also discuss some challenges and opportunities in the emerging field of deep learning for fish habitat processing. This paper is written to serve as a tutorial for marine scientists who would like to grasp a high-level understanding of DL, develop it for their applications by following our step-by-step tutorial, and see how it is evolving to facilitate their research efforts. At the same time, it is suitable for computer scientists who would like to survey state-of-the-art DL-based methodologies for fish habitat monitoring
    • …
    corecore