500 research outputs found
Plant image retrieval using color, shape and texture features
We present a content-based image retrieval system for plant image retrieval, intended especially for the house plant identification problem. A plant image consists of a collection of overlapping leaves and possibly flowers, which makes the problem challenging.We studied the suitability of various well-known color, shape and texture features for this problem, as well as introducing some new texture matching techniques and shape features. Feature extraction is applied after segmenting the plant region from the background using the max-flow min-cut technique. Results on a database of 380 plant images belonging to 78 different types of plants show promise of the proposed new techniques
and the overall system: in 55% of the queries, the correct plant image is retrieved among the top-15 results. Furthermore, the accuracy goes up to 73% when a 132-image subset of well-segmented plant images are considered
Biometric Person Identification Using Near-infrared Hand-dorsa Vein Images
Biometric recognition is becoming more and more important with the increasing demand for security, and more usable with the improvement of computer vision as well as pattern recognition technologies. Hand vein patterns have been recognised as a good biometric measure for personal identification due to many excellent characteristics, such as uniqueness and stability, as well as difficulty to copy or forge. This thesis covers all the research and development aspects of a biometric person identification system based on near-infrared hand-dorsa vein images.
Firstly, the design and realisation of an optimised vein image capture device is presented. In order to maximise the quality of the captured images with relatively low cost, the infrared illumination and imaging theory are discussed. Then a database containing 2040 images from 102 individuals, which were captured by this device, is introduced.
Secondly, image analysis and the customised image pre-processing methods are discussed. The consistency of the database images is evaluated using mean squared error (MSE) and peak signal-to-noise ratio (PSNR). Geometrical pre-processing, including shearing correction and region of interest (ROI) extraction, is introduced to improve image consistency. Image noise is evaluated using total variance (TV) values. Grey-level pre-processing, including grey-level normalisation, filtering and adaptive histogram equalisation are applied to enhance vein patterns.
Thirdly, a gradient-based image segmentation algorithm is compared with popular algorithms in references like Niblack and Threshold Image algorithm to demonstrate its effectiveness in vein pattern extraction. Post-processing methods including morphological filtering and thinning are also presented.
Fourthly, feature extraction and recognition methods are investigated, with several new approaches based on keypoints and local binary patterns (LBP) proposed. Through comprehensive comparison with other approaches based on structure and texture features as well as performance evaluation using the database created with 2040 images, the proposed approach based on multi-scale partition LBP is shown to provide the best recognition performance with an identification rate of nearly 99%.
Finally, the whole hand-dorsa vein identification system is presented with a user interface for administration of user information and for person identification
Markerless deformation capture of hoverfly wings using multiple calibrated cameras
This thesis introduces an algorithm for the automated deformation capture of hoverfly
wings from multiple camera image sequences. The algorithm is capable of extracting
dense surface measurements, without the aid of fiducial markers, over an arbitrary number
of wingbeats of hovering flight and requires limited manual initialisation. A novel motion
prediction method, called the ‘normalised stroke model’, makes use of the similarity of adjacent
wing strokes to predict wing keypoint locations, which are then iteratively refined in
a stereo image registration procedure. Outlier removal, wing fitting and further refinement
using independently reconstructed boundary points complete the algorithm. It was tested
on two hovering data sets, as well as a challenging flight manoeuvre. By comparing the
3-d positions of keypoints extracted from these surfaces with those resulting from manual
identification, the accuracy of the algorithm is shown to approach that of a fully manual
approach. In particular, half of the algorithm-extracted keypoints were within 0.17mm of
manually identified keypoints, approximately equal to the error of the manual identification
process. This algorithm is unique among purely image based flapping flight studies in the
level of automation it achieves, and its generality would make it applicable to wing tracking
of other insects
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
A Federated Approach for Fine-Grained Classification of Fashion Apparel
As online retail services proliferate and are pervasive in modern lives,
applications for classifying fashion apparel features from image data are
becoming more indispensable. Online retailers, from leading companies to
start-ups, can leverage such applications in order to increase profit margin
and enhance the consumer experience. Many notable schemes have been proposed to
classify fashion items, however, the majority of which focused upon classifying
basic-level categories, such as T-shirts, pants, skirts, shoes, bags, and so
forth. In contrast to most prior efforts, this paper aims to enable an in-depth
classification of fashion item attributes within the same category. Beginning
with a single dress, we seek to classify the type of dress hem, the hem length,
and the sleeve length. The proposed scheme is comprised of three major stages:
(a) localization of a target item from an input image using semantic
segmentation, (b) detection of human key points (e.g., point of shoulder) using
a pre-trained CNN and a bounding box, and (c) three phases to classify the
attributes using a combination of algorithmic approaches and deep neural
networks. The experimental results demonstrate that the proposed scheme is
highly effective, with all categories having average precision of above 93.02%,
and outperforms existing Convolutional Neural Networks (CNNs)-based schemes.Comment: 11 pages, 4 figures, 5 tables, submitted to IEEE ACCESS (under
review
SiLK -- Simple Learned Keypoints
Keypoint detection & descriptors are foundational tech-nologies for computer
vision tasks like image matching, 3D reconstruction and visual odometry.
Hand-engineered methods like Harris corners, SIFT, and HOG descriptors have
been used for decades; more recently, there has been a trend to introduce
learning in an attempt to improve keypoint detectors. On inspection however,
the results are difficult to interpret; recent learning-based methods employ a
vast diversity of experimental setups and design choices: empirical results are
often reported using different backbones, protocols, datasets, types of
supervisions or tasks. Since these differences are often coupled together, it
raises a natural question on what makes a good learned keypoint detector. In
this work, we revisit the design of existing keypoint detectors by
deconstructing their methodologies and identifying the key components. We
re-design each component from first-principle and propose Simple Learned
Keypoints (SiLK) that is fully-differentiable, lightweight, and flexible.
Despite its simplicity, SiLK advances new state-of-the-art on Detection
Repeatability and Homography Estimation tasks on HPatches and 3D Point-Cloud
Registration task on ScanNet, and achieves competitive performance to
state-of-the-art on camera pose estimation in 2022 Image Matching Challenge and
ScanNet
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- …