30,956 research outputs found

    Vision-based Pose Estimation for Augmented Reality : A Comparison Study

    Full text link
    Augmented reality aims to enrich our real world by inserting 3D virtual objects. In order to accomplish this goal, it is important that virtual elements are rendered and aligned in the real scene in an accurate and visually acceptable way. The solution of this problem can be related to a pose estimation and 3D camera localization. This paper presents a survey on different approaches of 3D pose estimation in augmented reality and gives classification of key-points-based techniques. The study given in this paper may help both developers and researchers in the field of augmented reality.Comment: IEEE International Conference on Pattern Analysis and Intelligent Systems PAIS'201

    MVC-3D: Adaptive Design Pattern for Virtual and Augmented Reality Systems

    Full text link
    In this paper, we present MVC-3D design pattern to develop virtual and augmented (or mixed) reality interfaces that use new types of sensors, modalities and implement specific algorithms and simulation models. The proposed pattern represents the extension of classic MVC pattern by enriching the View component (interactive View) and adding a specific component (Library). The results obtained on the development of augmented reality interfaces showed that the complexity of M, iV and C components is reduced. The complexity increases only on the Library component (L). This helps the programmers to well structure their models even if the interface complexity increases. The proposed design pattern is also used in a design process called MVC-3D in the loop that enables a seamless evolution from initial prototype to the final system

    cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey

    Full text link
    The paper gives futuristic challenges disscussed in the cvpaper.challenge. In 2015 and 2016, we thoroughly study 1,600+ papers in several conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV

    cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey

    Full text link
    The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, in order to grasp the trends in the field. Further, we are proposing "DeepSurvey" as a mechanism embodying the entire process from the reading through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape

    Fingertip Detection and Tracking for Recognition of Air-Writing in Videos

    Full text link
    Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any hand-held device. In this work, we address the problem of mid-air finger writing using web-cam video as input. In spite of recent advances in object detection and tracking, accurate and robust detection and tracking of the fingertip remains a challenging task, primarily due to small dimension of the fingertip. Moreover, the initialization and termination of mid-air finger writing is also challenging due to the absence of any standard delimiting criterion. To solve these problems, we propose a new writing hand pose detection algorithm for initialization of air-writing using the Faster R-CNN framework for accurate hand detection followed by hand segmentation and finally counting the number of raised fingers based on geometrical properties of the hand. Further, we propose a robust fingertip detection and tracking approach using a new signature function called distance-weighted curvature entropy. Finally, a fingertip velocity-based termination criterion is used as a delimiter to mark the completion of the air-writing gesture. Experiments show the superiority of the proposed fingertip detection and tracking algorithm over state-of-the-art approaches giving a mean precision of 73.1 % while achieving real-time performance at 18.5 fps, a condition which is of vital importance to air-writing. Character recognition experiments give a mean accuracy of 96.11 % using the proposed air-writing system, a result which is comparable to that of existing handwritten character recognition systems.Comment: 32 pages, 10 figures, 2 tables. Submitted to Journal of Expert Systems with Application

    Survey of Computer Vision and Machine Learning in Gastrointestinal Endoscopy

    Full text link
    This paper attempts to provide the reader a place to begin studying the application of computer vision and machine learning to gastrointestinal (GI) endoscopy. They have been classified into 18 categories. It should be be noted by the reader that this is a review from pre-deep learning era. A lot of deep learning based applications have not been covered in this thesis

    Face Recognition: A Novel Multi-Level Taxonomy based Survey

    Full text link
    In a world where security issues have been gaining growing importance, face recognition systems have attracted increasing attention in multiple application areas, ranging from forensics and surveillance to commerce and entertainment. To help understanding the landscape and abstraction levels relevant for face recognition systems, face recognition taxonomies allow a deeper dissection and comparison of the existing solutions. This paper proposes a new, more encompassing and richer multi-level face recognition taxonomy, facilitating the organization and categorization of available and emerging face recognition solutions; this taxonomy may also guide researchers in the development of more efficient face recognition solutions. The proposed multi-level taxonomy considers levels related to the face structure, feature support and feature extraction approach. Following the proposed taxonomy, a comprehensive survey of representative face recognition solutions is presented. The paper concludes with a discussion on current algorithmic and application related challenges which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If accepted, the copy of record will be available at the IET Digital Librar

    Incorporating prior knowledge in medical image segmentation: a survey

    Full text link
    Medical image segmentation, the task of partitioning an image into meaningful parts, is an important step toward automating medical image analysis and is at the crux of a variety of medical imaging applications, such as computer aided diagnosis, therapy planning and delivery, and computer aided interventions. However, the existence of noise, low contrast and objects' complexity in medical images are critical obstacles that stand in the way of achieving an ideal segmentation system. Incorporating prior knowledge into image segmentation algorithms has proven useful for obtaining more accurate and plausible results. This paper surveys the different types of prior knowledge that have been utilized in different segmentation frameworks. We focus our survey on optimization-based methods that incorporate prior information into their frameworks. We review and compare these methods in terms of the types of prior employed, the domain of formulation (continuous vs. discrete), and the optimization techniques (global vs. local). We also created an interactive online database of existing works and categorized them based on the type of prior knowledge they use. Our website is interactive so that researchers can contribute to keep the database up to date. We conclude the survey by discussing different aspects of designing an energy functional for image segmentation, open problems, and future perspectives.Comment: Survey paper, 30 page

    Detection, Recognition and Tracking of Moving Objects from Real-time Video via Visual Vocabulary Model and Species Inspired PSO

    Full text link
    In this paper, we address the basic problem of recognizing moving objects in video images using Visual Vocabulary model and Bag of Words and track our object of interest in the subsequent video frames using species inspired PSO. Initially, the shadow free images are obtained by background modelling followed by foreground modeling to extract the blobs of our object of interest. Subsequently, we train a cubic SVM with human body datasets in accordance with our domain of interest for recognition and tracking. During training, using the principle of Bag of Words we extract necessary features of certain domains and objects for classification. Subsequently, matching these feature sets with those of the extracted object blobs that are obtained by subtracting the shadow free background from the foreground, we detect successfully our object of interest from the test domain. The performance of the classification by cubic SVM is satisfactorily represented by confusion matrix and ROC curve reflecting the accuracy of each module. After classification, our object of interest is tracked in the test domain using species inspired PSO. By combining the adaptive learning tools with the efficient classification of description, we achieve optimum accuracy in recognition of the moving objects. We evaluate our algorithm benchmark datasets: iLIDS, VIVID, Walking2, Woman. Comparative analysis of our algorithm against the existing state-of-the-art trackers shows very satisfactory and competitive results

    Fractional Local Neighborhood Intensity Pattern for Image Retrieval using Genetic Algorithm

    Full text link
    In this paper, a new texture descriptor named "Fractional Local Neighborhood Intensity Pattern" (FLNIP) has been proposed for content based image retrieval (CBIR). It is an extension of the Local Neighborhood Intensity Pattern (LNIP)[1]. FLNIP calculates the relative intensity difference between a particular pixel and the center pixel of a 3x3 window by considering the relationship with adjacent neighbors. In this work, the fractional change in the local neighborhood involving the adjacent neighbors has been calculated first with respect to one of the eight neighbors of the center pixel of a 3x3 window. Next, the fractional change has been calculated with respect to the center itself. The two values of fractional change are next compared to generate a binary bit pattern. Both sign and magnitude information are encoded in a single descriptor as it deals with the relative change in magnitude in the adjacent neighborhood i.e., the comparison of the fractional change. The descriptor is applied on four multi-resolution images -- one being the raw image and the other three being filtered gaussian images obtained by applying gaussian filters of different standard deviations on the raw image to signify the importance of exploring texture information at different resolutions in an image. The four sets of distances obtained between the query and the target image are then combined with a genetic algorithm based approach to improve the retrieval performance by minimizing the distance between similar class images. The performance of the method has been tested for image retrieval on four popular databases. The precision and recall values observed on these databases have been compared with recent state-of-art local patterns. The proposed method has shown a significant improvement over many other existing methods.Comment: MTAP, Springer(Minor Revision
    • …
    corecore