30,956 research outputs found
Vision-based Pose Estimation for Augmented Reality : A Comparison Study
Augmented reality aims to enrich our real world by inserting 3D virtual
objects. In order to accomplish this goal, it is important that virtual
elements are rendered and aligned in the real scene in an accurate and visually
acceptable way. The solution of this problem can be related to a pose
estimation and 3D camera localization. This paper presents a survey on
different approaches of 3D pose estimation in augmented reality and gives
classification of key-points-based techniques. The study given in this paper
may help both developers and researchers in the field of augmented reality.Comment: IEEE International Conference on Pattern Analysis and Intelligent
Systems PAIS'201
MVC-3D: Adaptive Design Pattern for Virtual and Augmented Reality Systems
In this paper, we present MVC-3D design pattern to develop virtual and
augmented (or mixed) reality interfaces that use new types of sensors,
modalities and implement specific algorithms and simulation models. The
proposed pattern represents the extension of classic MVC pattern by enriching
the View component (interactive View) and adding a specific component
(Library). The results obtained on the development of augmented reality
interfaces showed that the complexity of M, iV and C components is reduced. The
complexity increases only on the Library component (L). This helps the
programmers to well structure their models even if the interface complexity
increases. The proposed design pattern is also used in a design process called
MVC-3D in the loop that enables a seamless evolution from initial prototype to
the final system
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Fingertip Detection and Tracking for Recognition of Air-Writing in Videos
Air-writing is the process of writing characters or words in free space using
finger or hand movements without the aid of any hand-held device. In this work,
we address the problem of mid-air finger writing using web-cam video as input.
In spite of recent advances in object detection and tracking, accurate and
robust detection and tracking of the fingertip remains a challenging task,
primarily due to small dimension of the fingertip. Moreover, the initialization
and termination of mid-air finger writing is also challenging due to the
absence of any standard delimiting criterion. To solve these problems, we
propose a new writing hand pose detection algorithm for initialization of
air-writing using the Faster R-CNN framework for accurate hand detection
followed by hand segmentation and finally counting the number of raised fingers
based on geometrical properties of the hand. Further, we propose a robust
fingertip detection and tracking approach using a new signature function called
distance-weighted curvature entropy. Finally, a fingertip velocity-based
termination criterion is used as a delimiter to mark the completion of the
air-writing gesture. Experiments show the superiority of the proposed fingertip
detection and tracking algorithm over state-of-the-art approaches giving a mean
precision of 73.1 % while achieving real-time performance at 18.5 fps, a
condition which is of vital importance to air-writing. Character recognition
experiments give a mean accuracy of 96.11 % using the proposed air-writing
system, a result which is comparable to that of existing handwritten character
recognition systems.Comment: 32 pages, 10 figures, 2 tables. Submitted to Journal of Expert
Systems with Application
Survey of Computer Vision and Machine Learning in Gastrointestinal Endoscopy
This paper attempts to provide the reader a place to begin studying the
application of computer vision and machine learning to gastrointestinal (GI)
endoscopy. They have been classified into 18 categories. It should be be noted
by the reader that this is a review from pre-deep learning era. A lot of deep
learning based applications have not been covered in this thesis
Face Recognition: A Novel Multi-Level Taxonomy based Survey
In a world where security issues have been gaining growing importance, face
recognition systems have attracted increasing attention in multiple application
areas, ranging from forensics and surveillance to commerce and entertainment.
To help understanding the landscape and abstraction levels relevant for face
recognition systems, face recognition taxonomies allow a deeper dissection and
comparison of the existing solutions. This paper proposes a new, more
encompassing and richer multi-level face recognition taxonomy, facilitating the
organization and categorization of available and emerging face recognition
solutions; this taxonomy may also guide researchers in the development of more
efficient face recognition solutions. The proposed multi-level taxonomy
considers levels related to the face structure, feature support and feature
extraction approach. Following the proposed taxonomy, a comprehensive survey of
representative face recognition solutions is presented. The paper concludes
with a discussion on current algorithmic and application related challenges
which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If
accepted, the copy of record will be available at the IET Digital Librar
Incorporating prior knowledge in medical image segmentation: a survey
Medical image segmentation, the task of partitioning an image into meaningful
parts, is an important step toward automating medical image analysis and is at
the crux of a variety of medical imaging applications, such as computer aided
diagnosis, therapy planning and delivery, and computer aided interventions.
However, the existence of noise, low contrast and objects' complexity in
medical images are critical obstacles that stand in the way of achieving an
ideal segmentation system. Incorporating prior knowledge into image
segmentation algorithms has proven useful for obtaining more accurate and
plausible results. This paper surveys the different types of prior knowledge
that have been utilized in different segmentation frameworks. We focus our
survey on optimization-based methods that incorporate prior information into
their frameworks. We review and compare these methods in terms of the types of
prior employed, the domain of formulation (continuous vs. discrete), and the
optimization techniques (global vs. local). We also created an interactive
online database of existing works and categorized them based on the type of
prior knowledge they use. Our website is interactive so that researchers can
contribute to keep the database up to date. We conclude the survey by
discussing different aspects of designing an energy functional for image
segmentation, open problems, and future perspectives.Comment: Survey paper, 30 page
Detection, Recognition and Tracking of Moving Objects from Real-time Video via Visual Vocabulary Model and Species Inspired PSO
In this paper, we address the basic problem of recognizing moving objects in
video images using Visual Vocabulary model and Bag of Words and track our
object of interest in the subsequent video frames using species inspired PSO.
Initially, the shadow free images are obtained by background modelling followed
by foreground modeling to extract the blobs of our object of interest.
Subsequently, we train a cubic SVM with human body datasets in accordance with
our domain of interest for recognition and tracking. During training, using the
principle of Bag of Words we extract necessary features of certain domains and
objects for classification. Subsequently, matching these feature sets with
those of the extracted object blobs that are obtained by subtracting the shadow
free background from the foreground, we detect successfully our object of
interest from the test domain. The performance of the classification by cubic
SVM is satisfactorily represented by confusion matrix and ROC curve reflecting
the accuracy of each module. After classification, our object of interest is
tracked in the test domain using species inspired PSO. By combining the
adaptive learning tools with the efficient classification of description, we
achieve optimum accuracy in recognition of the moving objects. We evaluate our
algorithm benchmark datasets: iLIDS, VIVID, Walking2, Woman. Comparative
analysis of our algorithm against the existing state-of-the-art trackers shows
very satisfactory and competitive results
Fractional Local Neighborhood Intensity Pattern for Image Retrieval using Genetic Algorithm
In this paper, a new texture descriptor named "Fractional Local Neighborhood
Intensity Pattern" (FLNIP) has been proposed for content based image retrieval
(CBIR). It is an extension of the Local Neighborhood Intensity Pattern
(LNIP)[1]. FLNIP calculates the relative intensity difference between a
particular pixel and the center pixel of a 3x3 window by considering the
relationship with adjacent neighbors. In this work, the fractional change in
the local neighborhood involving the adjacent neighbors has been calculated
first with respect to one of the eight neighbors of the center pixel of a 3x3
window. Next, the fractional change has been calculated with respect to the
center itself. The two values of fractional change are next compared to
generate a binary bit pattern. Both sign and magnitude information are encoded
in a single descriptor as it deals with the relative change in magnitude in the
adjacent neighborhood i.e., the comparison of the fractional change. The
descriptor is applied on four multi-resolution images -- one being the raw
image and the other three being filtered gaussian images obtained by applying
gaussian filters of different standard deviations on the raw image to signify
the importance of exploring texture information at different resolutions in an
image. The four sets of distances obtained between the query and the target
image are then combined with a genetic algorithm based approach to improve the
retrieval performance by minimizing the distance between similar class images.
The performance of the method has been tested for image retrieval on four
popular databases. The precision and recall values observed on these databases
have been compared with recent state-of-art local patterns. The proposed method
has shown a significant improvement over many other existing methods.Comment: MTAP, Springer(Minor Revision
- …