26 research outputs found

    Improving 3D Shape Retrieval Methods based on Bag-of-Feature Approach by using Local Codebooks

    No full text
    Also available online at http://www.sersc.org/journals/IJFGCN/vol5_no4/3.pdfInternational audienceRecent investigations illustrate that view-based methods, with pose normalization pre-processing get better performances in retrieving rigid models than other approaches and still the most popular and practical methods in the field of 3D shape retrieval. In this paper we present an improvement of 3D shape retrieval methods based on bag-of features approach. These methods use this approach to integrate a set of features extracted from 2D views of the 3D objects using the SIFT (Scale Invariant Feature Transform) algorithm into histograms using vector quantization which is based on a global visual codebook. In order to improve the retrieval performances, we propose to associate to each 3D object its local visual codebook instead of a unique global codebook. The experimental results obtained on the Princeton Shape Benchmark database, for the BF-SIFT method proposed by Ohbuchi, et al., and CM-BOF proposed by Zhouhui, et al., show that the proposed approach performs better than the original approach

    Organising and structuring a visual diary using visual interest point detectors

    Get PDF
    As wearable cameras become more popular, researchers are increasingly focusing on novel applications to manage the large volume of data these devices produce. One such application is the construction of a Visual Diary from an individual’s photographs. Microsoft’s SenseCam, a device designed to passively record a Visual Diary and cover a typical day of the user wearing the camera, is an example of one such device. The vast quantity of images generated by these devices means that the management and organisation of these collections is not a trivial matter. We believe wearable cameras, such as SenseCam, will become more popular in the future and the management of the volume of data generated by these devices is a key issue. Although there is a significant volume of work in the literature in the object detection and recognition and scene classification fields, there is little work in the area of setting detection. Furthermore, few authors have examined the issues involved in analysing extremely large image collections (like a Visual Diary) gathered over a long period of time. An algorithm developed for setting detection should be capable of clustering images captured at the same real world locations (e.g. in the dining room at home, in front of the computer in the office, in the park, etc.). This requires the selection and implementation of suitable methods to identify visually similar backgrounds in images using their visual features. We present a number of approaches to setting detection based on the extraction of visual interest point detectors from the images. We also analyse the performance of two of the most popular descriptors - Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF).We present an implementation of a Visual Diary application and evaluate its performance via a series of user experiments. Finally, we also outline some techniques to allow the Visual Diary to automatically detect new settings, to scale as the image collection continues to grow substantially over time, and to allow the user to generate a personalised summary of their data

    Video metadata extraction in a videoMail system

    Get PDF
    Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving

    Sketch-based digital storyboards and floor plans for authoring computer-generated film pre-visuals

    Get PDF
    Pre-visualisation is an important tool for planning films during the pre-production phase of filmmaking. Existing pre-visualisation authoring tools do not effectively support the user in authoring pre-visualisations without impairing software usability. These tools require the user to either have programming skills, be experienced in modelling and animation, or use drag-and-drop style interfaces. These interaction methods do not intuitively fit with pre-production activities such as floor planning and storyboarding, and existing tools that apply a storyboarding metaphor do not automatically interpret user sketches. The goal of this research was to investigate how sketch-based user interfaces and methods from computer vision could be used for supporting pre-visualisation authoring using a storyboarding approach. The requirements for such a sketch-based storyboarding tool were determined from literature and an interview with Triggerfish Animation Studios. A framework was developed to support sketch-based pre-visualisation authoring using a storyboarding approach. Algorithms for describing user sketches, recognising objects and performing pose estimation were designed to automatically interpret user sketches. A proof of concept prototype implementation of this framework was evaluated in order to assess its usability benefit. It was found that the participants could author pre-visualisations effectively, efficiently and easily. The results of the usability evaluation also showed that the participants were satisfied with the overall design and usability of the prototype tool. The positive and negative findings of the evaluation were interpreted and combined with existing heuristics in order to create a set of guidelines for designing similar sketch-based pre-visualisation authoring tools that apply the storyboarding approach. The successful implementation of the proof of concept prototype tool provides practical evidence of the feasibility of sketch-based pre-visualisation authoring. The positive results from the usability evaluation established that sketch-based interfacing techniques can be used effectively with a storyboarding approach for authoring pre-visualisations without impairing software usability

    Sketch-based digital storyboards and floor plans for authoring computer-generated film pre-visuals

    Get PDF
    Pre-visualisation is an important tool for planning films during the pre-production phase of filmmaking. Existing pre-visualisation authoring tools do not effectively support the user in authoring pre-visualisations without impairing software usability. These tools require the user to either have programming skills, be experienced in modelling and animation, or use drag-and-drop style interfaces. These interaction methods do not intuitively fit with pre-production activities such as floor planning and storyboarding, and existing tools that apply a storyboarding metaphor do not automatically interpret user sketches. The goal of this research was to investigate how sketch-based user interfaces and methods from computer vision could be used for supporting pre-visualisation authoring using a storyboarding approach. The requirements for such a sketch-based storyboarding tool were determined from literature and an interview with Triggerfish Animation Studios. A framework was developed to support sketch-based pre-visualisation authoring using a storyboarding approach. Algorithms for describing user sketches, recognising objects and performing pose estimation were designed to automatically interpret user sketches. A proof of concept prototype implementation of this framework was evaluated in order to assess its usability benefit. It was found that the participants could author pre-visualisations effectively, efficiently and easily. The results of the usability evaluation also showed that the participants were satisfied with the overall design and usability of the prototype tool. The positive and negative findings of the evaluation were interpreted and combined with existing heuristics in order to create a set of guidelines for designing similar sketch-based pre-visualisation authoring tools that apply the storyboarding approach. The successful implementation of the proof of concept prototype tool provides practical evidence of the feasibility of sketch-based pre-visualisation authoring. The positive results from the usability evaluation established that sketch-based interfacing techniques can be used effectively with a storyboarding approach for authoring pre-visualisations without impairing software usability

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Deep Binary Representation Learning for Single/Cross-Modal Data Retrieval

    Get PDF
    Data similarity search is widely regarded as a classic topic in the realms of computer vision, machine learning and data mining. Providing a certain query, the retrieval model sorts out the related candidates in the database according to their similarities, where representation learning methods and nearest-neighbour search apply. As matching data features in Hamming space is computationally cheaper than in Euclidean space, learning to hash and binary representations are generally appreciated in modern retrieval models. Recent research seeks solutions in deep learning to formulate the hash functions, showing great potential in retrieval performance. In this thesis, we gradually extend our research topics and contributions from unsupervised single-modal deep hashing to supervised cross-modal hashing _nally zero-shot hashing problems, addressing the following challenges in deep hashing. First of all, existing unsupervised deep hashing works are still not attaining leading retrieval performance compared with the shallow ones. To improve this, a novel unsupervised single-modal hashing model is proposed in this thesis, named Deep Variational Binaries (DVB). We introduce the popular conditional variational auto-encoders to formulate the encoding function. By minimizing the reconstruction error of the latent variables, the proposed model produces compact binary codes without training supervision. Experiments on benchmarked datasets show that our model outperform existing unsupervised hashing methods. The second problem is that current cross-modal hashing methods only consider holistic image representations and fail to model descriptive sentences, which is inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To handle this problem, we propose a supervised deep cross-modal hashing model called Textual-Visual Deep Binaries (TVDB). Region-based neural networks and recurrent neural networks are involved in the image encoding network in order to make e_ective use of visual information, while the text encoder is built using a convolutional neural network. We additionally introduce an e_cient in-batch optimization routine to train the network parameters. The proposed mode successfully outperforms state-of-the-art methods on large-scale datasets. Finally, existing hashing models fail when the categories of query data have never been seen during training. This scenario is further extended into a novel zero-shot cross-modal hashing task in this thesis, and a Zero-shot Sketch-Image Hashing (ZSIH) scheme is then proposed with graph convolution and stochastic neurons. Experiments show that the proposed ZSIH model signi_cantly outperforms existing hashing algorithms in the zero-shot retrieval task. Experiments suggest our proposed and novel hashing methods outperform state-of-the-art researches in single-modal and cross-modal data retrieval

    Smart video surveillance of pedestrians : fixed, aerial, and multi-camera methods

    Get PDF
    Crowd analysis from video footage is an active research topic in the field of computer vision. Crowds can be analaysed using different approaches, depending on their characteristics. Furthermore, analysis can be performed from footage obtained through different sources. Fixed CCTV cameras can be used, as well as cameras mounted on moving vehicles. To begin, a literature review is provided, where research works in the the fields of crowd analysis, as well as object and people tracking, occlusion handling, multi-view and sensor fusion, and multi-target tracking are analyses and compared, and their advantages and limitations highlighted. Following that, the three contributions of this thesis are presented: in a first study, crowds will be classified based on various cues (i.e. density, entropy), so that the best approaches to further analyse behaviour can be selected; then, some of the challenges of individual target tracking from aerial video footage will be tackled; finally, a study on the analysis of groups of people from multiple cameras is proposed. The analysis entails the movements of people and objects in the scene. The idea is to track as many people as possible within the crowd, and to be able to obtain knowledge from their movements, as a group, and to classify different types of scenes. An additional contribution of this thesis, are two novel datasets: on the one hand, a first set to test the proposed aerial video analysis methods; on the other, a second to validate the third study, that is, with groups of people recorded from multiple overlapping cameras performing different actions
    corecore