916 research outputs found
Quantization Selection of Colour Histogram Bins to Categorize the Colour Appearance of Landscape Paintings for Image Retrieval
In the world of today, most images are digitized and kept in digital libraries for better organization and management. With the growth of information and communication technology, collection holders such as museums or cultural institutions have been increasingly interested in making their collections available anytime and anywhere for any Image Retrieval (IR) activities such as browsing and searching. In a colour image retrieval application, images retrieved by users are accomplished according to their specifications on what they want or acquire, which could be based upon so many concepts. We suggest an approach to categorize the colour appearances of whole scene landscape painting images based on human colour perception. The colour features in the image are represented using a colour histogram. We then find the suitable quantization bins that can be used to generate optimum colour histograms for all categories of colour appearances, which is selected based on theHarmonic Mean of the precision and recall, also known as F-Score percentage higher saturated value. Colour appearance attributes in the CIELab colour model (L-Lightness, a and b are colour-opponent dimension) are used to generate colour appearance feature vectors namely the saturation metric, lightness metric and multicoloured metric. For the categorizations, we use the Nearest Neighbour (NN) method to detect the classes by using the predefined colour appearance descriptor measures and the pre-set thresholds. The experimental results show that the quantization of CIELab colour model into 11 uniformly bins for each component had achieved the optimum result for all colour appearances categories
Realtime Color Stereovision Processing
Recent developments in aviation have made micro air vehicles (MAVs) a reality. These featherweight palm-sized radio-controlled flying saucers embody the future of air-to-ground combat. No one has ever successfully implemented an autonomous control system for MAVs. Because MAVs are physically small with limited energy supplies, video signals offer superiority over radar for navigational applications. This research takes a step forward in real time machine vision processing. It investigates techniques for implementing a real time stereovision processing system using two miniature color cameras. The effects of poor-quality optics are overcome by a robust algorithm, which operates in real time and achieves frame rates up to 10 fps in ideal conditions. The vision system implements innovative work in the following five areas of vision processing: fast image registration preprocessing, object detection, feature correspondence, distortion-compensated ranging, and multi scale nominal frequency-based object recognition. Results indicate that the system can provide adequate obstacle avoidance feedback for autonomous vehicle control. However, typical relative position errors are about 10%-to high for surveillance applications. The range of operation is also limited to between 6 - 30 m. The root of this limitation is imprecise feature correspondence: with perfect feature correspondence the range would extend to between 0.5 - 30 m. Stereo camera separation limits the near range, while optical resolution limits the far range. Image frame sizes are 160x120 pixels. Increasing this size will improve far range characteristics but will also decrease frame rate. Image preprocessing proved to be less appropriate than precision camera alignment in this application. A proof of concept for object recognition shows promise for applications with more precise object detection. Future recommendations are offered in all five areas of vision processing
Matching and Predicting Street Level Images
The paradigm of matching images to a very large dataset
has been used for numerous vision tasks and is a powerful one. If the
image dataset is large enough, one can expect to nd good matches of
almost any image to the database, allowing label transfer [3, 15], and
image editing or enhancement [6, 11]. Users of this approach will want
to know how many images are required, and what features to use for
nding semantic relevant matches. Furthermore, for navigation tasks or
to exploit context, users will want to know the predictive quality of the
dataset: can we predict the image that would be seen under changes in
camera position?
We address these questions in detail for one category of images: street
level views. We have a dataset of images taken from an enumeration of
positions and viewpoints within Pittsburgh.We evaluate how well we can
match those images, using images from non-Pittsburgh cities, and how
well we can predict the images that would be seen under changes in cam-
era position. We compare performance for these tasks for eight di erent
feature sets, nding a feature set that outperforms the others (HOG).
A combination of all the features performs better in the prediction task
than any individual feature. We used Amazon Mechanical Turk workers
to rank the matches and predictions of di erent algorithm conditions by
comparing each one to the selection of a random image. This approach
can evaluate the e cacy of di erent feature sets and parameter settings
for the matching paradigm with other image categories.United States. Dept. of Defense (ARDA VACE)United States. National Geospatial-Intelligence Agency (NEGI-1582-04- 0004)United States. National Geospatial-Intelligence Agency (MURI Grant N00014-06-1-0734)France. Agence nationale de la recherche (project HFIBMR (ANR-07-BLAN- 0331-01))Institut national de recherche en informatique et en automatique (France)Xerox Fellowship Progra
Recommended from our members
Automated Detection and Counting of Pedestrians on an Urban Roadside
This thesis implements an automated system that counts pedestrians with 85% accuracy. Two approaches have been considered and evaluated in terms of count accuracy, cost and ease of deployment. The first approach employs the Autoscope Solo Terra, a traffic camera which is widely used to monitor vehicular traffic. The Solo Terra supports an image processing-based detector that counts the number of objects crossing user-defined areas in the captured image. The count is updated based on the amount of movement across the selected regions. Therefore, a second approach has been considered that uses a histogram of oriented gradients (HoG), an advanced vision based algorithm proposed by Dalal et al. which distinguishes a pedestrian from a non-pedestrian based on an omega shape formed by the head and shoulders of a human being. The implemented detection software processes video frames that are streamed from a low-cost digital camera. The frames are divided into sub-regions which are scanned for an omega shape whenever movement is detected in those regions. It has been found that the HoG-based approach degrades in performance due to occlusion under dense pedestrian traffic conditions whereas the Solo Terra approach appears to be more robust. Undercounts and overcounts were encountered using the Solo Terra approach. To combat the disadvantages of both the approaches, they were integrated to form a single system where count is incremented predominantly using the Solo Terra. The HoG-based approach corrects the obtained count under certain conditions. A preliminary prototype of the integrated system has been verified
DART: Distribution Aware Retinal Transform for Event-based Cameras
We introduce a generic visual descriptor, termed as distribution aware
retinal transform (DART), that encodes the structural context using log-polar
grids for event cameras. The DART descriptor is applied to four different
problems, namely object classification, tracking, detection and feature
matching: (1) The DART features are directly employed as local descriptors in a
bag-of-features classification framework and testing is carried out on four
standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS,
NCaltech-101). (2) Extending the classification system, tracking is
demonstrated using two key novelties: (i) For overcoming the low-sample problem
for the one-shot learning of a binary classifier, statistical bootstrapping is
leveraged with online learning; (ii) To achieve tracker robustness, the scale
and rotation equivariance property of the DART descriptors is exploited for the
one-shot learning. (3) To solve the long-term object tracking problem, an
object detector is designed using the principle of cluster majority voting. The
detection scheme is then combined with the tracker to result in a high
intersection-over-union score with augmented ground truth annotations on the
publicly available event camera dataset. (4) Finally, the event context encoded
by DART greatly simplifies the feature correspondence problem, especially for
spatio-temporal slices far apart in time, which has not been explicitly tackled
in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201
Recommended from our members
Recognition of human interactions with vehicles using 3-D models and dynamic context
textThis dissertation describes two distinctive methods for human-vehicle interaction recognition: one for ground level videos and the other for aerial videos. For ground level videos, this dissertation presents a novel methodology which is able to estimate a detailed status of a scene involving multiple humans and vehicles. The system tracks their configuration even when they are performing complex interactions with severe occlusion such as when four persons are exiting a car together. The motivation is to identify the 3-D states of vehicles (e.g. status of doors), their relations with persons, which is necessary to analyze complex human-vehicle interactions (e.g. breaking into or stealing a vehicle), and the motion of humans and car doors to detect atomic human-vehicle interactions. A probabilistic algorithm has been designed to track humans and analyze their dynamic relationships with vehicles using a dynamic context. We have focused on two ideas. One is that many simple events can be detected based on a low-level analysis, and these detected events must contextually meet with human/vehicle status tracking results. The other is that the motion clue interferes with states in the current and future frames, and analyzing the motion is critical to detect such simple events. Our approach updates the probability of a person (or a vehicle) having a particular state based on these basic observed events. The probabilistic inference is made for the tracking process to match event-based evidence and motion-based evidence. For aerial videos, the object resolution is low, the visual cues are vague, and the detection and tracking of objects is less reliable as a consequence. Any method that requires accurate tracking of objects or the exact matching of event definition are better avoided. To address these issues, we present a temporal logic based approach which does not require training from event examples. At the low-level, we employ dynamic programming to perform fast model fitting between the tracked vehicle and the rendered 3-D vehicle models. At the semantic-level, given the localized event region of interest (ROI), we verify the time series of human-vehicle relationships with the pre-specified event definitions in a piecewise fashion. With special interest in recognizing a person getting into and out of a vehicle, we have tested our method on a subset of the VIRAT Aerial Video dataset and achieved superior results.Electrical and Computer Engineerin
Free-hand sketch recognition by multi-kernel feature learning
Abstract Free-hand sketch recognition has become increasingly popular due to the recent expansion of portable touchscreen devices. However, the problem is non-trivial due to the complexity of internal structures that leads to intra-class variations, coupled with the sparsity in visual cues that results in inter-class ambiguities. In order to address the structural complexity, a novel structured representation for sketches is proposed to capture the holistic structure of a sketch. Moreover, to overcome the visual cue sparsity problem and therefore achieve state-of-the-art recognition performance, we propose a Multiple Kernel Learning (MKL) framework for sketch recognition, fusing several features common to sketches. We evaluate the performance of all the proposed techniques on the most diverse sketch dataset to date (Mathias et al., 2012), and offer detailed and systematic analyses of the performance of different features and representations, including a breakdown by sketch-super-category. Finally, we investigate the use of attributes as a high-level feature for sketches and show how this complements low-level features for improving recognition performance under the MKL framework, and consequently explore novel applications such as attribute-based retrieval
Identity verification using computer vision for automatic garage door opening
We present a novel system for automatic identification of vehicles as part of an intelligent access control system for a garage entrance. Using a camera in the door, cars are detected and matched to the database of authenticated cars. Once a car is detected, License Plate Recognition (LPR) is applied using character detection and recognition. The found license plate number is matched with the database of authenticated plates. If the car is allowed access, the door will open automatically. The recognition of both cars and characters (LPR) is performed using state-ofthe- art shape descriptors and a linear classifier. Experiments have revealed that 90% of all cars are correctly authenticated from a single image only. Analysis of the computational complexity shows that an embedded implementation allows user authentication within approximately 300ms, which is well within the application constraints
Development of Vision-based Response of Autonomous Vehicles Towards Emergency Vehicles Using Infrastructure Enabled Autonomy
The effectiveness of law enforcement and public safety is directly dependent on the time taken by first responders to arrive at the scene of an emergency. The primary objective of this thesis is to develop techniques and actions of response for an autonomous vehicle in emergency scenarios. This work discusses the methods developed to identify Emergency Vehicles (EV) and use its localized information to develop response actions for autonomous vehicles in emergency scenarios using an Infrastructure-Enabled Autonomy (IEA) setup. IEA is a new paradigm in autonomous vehicles research that aims at distributed intelligence architecture by transferring the core functionalities of sensing and localization to a roadside infrastructure setup. In this work two independent frameworks were developed to identify Emergency vehicles in a video feed using computer vision techniques: (1) A one-stage framework where an object detection algorithm is trained on a custom dataset to detect EVs, (2) A two-stage framework where an object classification is independently implemented in series with an object detection pipeline to classify vehicles into EVs and nonEVs.
The performance of many popular classification models were compared on a combination of multi-spectral feature vectors of an image to identify the ideal combination to be used for EV identification rule. Localized position co-ordinates of an EV are obtained by deploying the classification routine on IEA. This position information is used as an input in an autonomous vehicle and an ideal response action is developed
- …