7 research outputs found

    LiveSketch: Query Perturbations for Guided Sketch-based Visual Search

    Get PDF
    LiveSketch is a novel algorithm for searching large image collections using hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch search by creating visual suggestions that augment the query as it is drawn, making query specification an iterative rather than one-shot process that helps disambiguate users' search intent. Our technical contributions are: a triplet convnet architecture that incorporates an RNN based variational autoencoder to search for images using vector (stroke-based) queries; real-time clustering to identify likely search intents (and so, targets within the search embedding); and the use of backpropagation from those targets to perturb the input stroke sequence, so suggesting alterations to the query in order to guide the search. We show improvements in accuracy and time-to-task over contemporary baselines using a 67M image corpus.Comment: Accepted to CVPR 201

    Object Pose Estimation in Monocular Image Using Modified FDCM

    Get PDF
    In this paper, a new method for object detection and pose estimation in a monocular image is proposed based on FDCM method. it can detect object with high speed running time, even if the object was under the partial occlusion or in bad illumination. In addition, It requires only single template without any training process. The Modied FDCM based on FDCM with improvments, the LSD method was used in MFDCM instead of the line tting method, besides the integral distance transform was replaced with a distance transform image, and using an angular Voronoi diagram. In addition, the search process depends on Line segments based search instead of the sliding window search in FDCM. The MFDCM was evaluated by comparing it with FDCM in dierent scenarios and with other four methods: COF, HALCON, LINE2D, and BOLD using D-textureless dataset. The comparison results show that MFDCM was at least 14 times faster than FDCM in tested scenarios. Furthermore, it has the highest correct detection rate among all tested method with small advantage from COF and BLOD methods, while it was a little slower than LINE2D which was the fasted method among compared methods. The results proves that MFDCM able to detect and pose estimation of the objects in the clear or clustered background from a monocular image with high speed running time, even if the object was under the partial occlusion which makes it robust and reliable for real-time applications

    Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality

    Get PDF
    Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities

    3D Sketching for Interactive Model Retrieval in Virtual Reality

    Get PDF
    We describe a novel method for searching 3D model collections using free-form sketches within a virtual environment as queries. As opposed to traditional sketch retrieval, our queries are drawn directly onto an example model. Using immersive virtual reality the user can express their query through a sketch that demonstrates the desired structure, color and texture. Unlike previous sketch-based retrieval methods, users remain immersed within the environment without relying on textual queries or 2D projections which can disconnect the user from the environment. We perform a test using queries over several descriptors, evaluating the precision in order to select the most accurate one. We show how a convolutional neural network (CNN) can create multi-view representations of colored 3D sketches. Using such a descriptor representation, our system is able to rapidly retrieve models and in this way, we provide the user with an interactive method of navigating large object datasets. Through a user study we demonstrate that by using our VR 3D model retrieval system, users can perform search more quickly and intuitively than with a naive linear browsing method. Using our system users can rapidly populate a virtual environment with specific models from a very large database, and thus the technique has the potential to be broadly applicable in immersive editing systems

    Interactive video asset retrieval using sketched queries

    No full text
    We present a new algorithm for searching video repositories using free-hand sketches. Our queries express both appearance (color, shape) and motion attributes, as well as semantic properties (ob-ject labels) enabling hybrid queries to be specified. Unlike existing sketch based video retrieval (SBVR) systems that enable hybrid queries of this form, we do not adopt a model fitting/optimization approach to match at query-time. Rather, we create an efficiently searchable index via a novel space-time descriptor that encapsu-lates all these properties. The real-time performance yielded by our indexing approach enables interactive refinement of search results within a relevance feedback (RF) framework; a unique contribution to SBVR. We evaluate our system over 700 sports footage clips ex-hibiting a variety of clutter and motion conditions, demonstrating significant accuracy and speed gains over the state of the art

    Application of Machine Learning within Visual Content Production

    Get PDF
    We are living in an era where digital content is being produced at a dazzling pace. The heterogeneity of contents and contexts is so varied that a numerous amount of applications have been created to respond to people and market demands. The visual content production pipeline is the generalisation of the process that allows a content editor to create and evaluate their product, such as a video, an image, a 3D model, etc. Such data is then displayed on one or more devices such as TVs, PC monitors, virtual reality head-mounted displays, tablets, mobiles, or even smartwatches. Content creation can be simple as clicking a button to film a video and then share it into a social network, or complex as managing a dense user interface full of parameters by using keyboard and mouse to generate a realistic 3D model for a VR game. In this second example, such sophistication results in a steep learning curve for beginner-level users. In contrast, expert users regularly need to refine their skills via expensive lessons, time-consuming tutorials, or experience. Thus, user interaction plays an essential role in the diffusion of content creation software, primarily when it is targeted to untrained people. In particular, with the fast spread of virtual reality devices into the consumer market, new opportunities for designing reliable and intuitive interfaces have been created. Such new interactions need to take a step beyond the point and click interaction typical of the 2D desktop environment. The interactions need to be smart, intuitive and reliable, to interpret 3D gestures and therefore, more accurate algorithms are needed to recognise patterns. In recent years, machine learning and in particular deep learning have achieved outstanding results in many branches of computer science, such as computer graphics and human-computer interface, outperforming algorithms that were considered state of the art, however, there are only fleeting efforts to translate this into virtual reality. In this thesis, we seek to apply and take advantage of deep learning models to two different content production pipeline areas embracing the following subjects of interest: advanced methods for user interaction and visual quality assessment. First, we focus on 3D sketching to retrieve models from an extensive database of complex geometries and textures, while the user is immersed in a virtual environment. We explore both 2D and 3D strokes as tools for model retrieval in VR. Therefore, we implement a novel system for improving accuracy in searching for a 3D model. We contribute an efficient method to describe models through 3D sketch via an iterative descriptor generation, focusing both on accuracy and user experience. To evaluate it, we design a user study to compare different interactions for sketch generation. Second, we explore the combination of sketch input and vocal description to correct and fine-tune the search for 3D models in a database containing fine-grained variation. We analyse sketch and speech queries, identifying a way to incorporate both of them into our system's interaction loop. Third, in the context of the visual content production pipeline, we present a detailed study of visual metrics. We propose a novel method for detecting rendering-based artefacts in images. It exploits analogous deep learning algorithms used when extracting features from sketches
    corecore