261 research outputs found

    Organising and structuring a visual diary using visual interest point detectors

    Get PDF
    As wearable cameras become more popular, researchers are increasingly focusing on novel applications to manage the large volume of data these devices produce. One such application is the construction of a Visual Diary from an individual’s photographs. Microsoft’s SenseCam, a device designed to passively record a Visual Diary and cover a typical day of the user wearing the camera, is an example of one such device. The vast quantity of images generated by these devices means that the management and organisation of these collections is not a trivial matter. We believe wearable cameras, such as SenseCam, will become more popular in the future and the management of the volume of data generated by these devices is a key issue. Although there is a significant volume of work in the literature in the object detection and recognition and scene classification fields, there is little work in the area of setting detection. Furthermore, few authors have examined the issues involved in analysing extremely large image collections (like a Visual Diary) gathered over a long period of time. An algorithm developed for setting detection should be capable of clustering images captured at the same real world locations (e.g. in the dining room at home, in front of the computer in the office, in the park, etc.). This requires the selection and implementation of suitable methods to identify visually similar backgrounds in images using their visual features. We present a number of approaches to setting detection based on the extraction of visual interest point detectors from the images. We also analyse the performance of two of the most popular descriptors - Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF).We present an implementation of a Visual Diary application and evaluate its performance via a series of user experiments. Finally, we also outline some techniques to allow the Visual Diary to automatically detect new settings, to scale as the image collection continues to grow substantially over time, and to allow the user to generate a personalised summary of their data

    Finger Vein Template Protection with Directional Bloom Filter

    Get PDF
    Biometrics has become a widely accepted solution for secure user authentication. However, the use of biometric traits raises serious concerns about the protection of personal data and privacy. Traditional biometric systems are vulnerable to attacks due to the storage of original biometric data in the system. Because biometric data cannot be changed once it has been compromised, the use of a biometric system is limited by the security of its template. To protect biometric templates, this paper proposes the use of directional bloom filters as a cancellable biometric approach to transform the biometric data into a non-invertible template for user authentication purposes. Recently, Bloom filter has been used for template protection due to its efficiency with small template size, alignment invariance, and irreversibility. Directional Bloom Filter improves on the original bloom filter. It generates hash vectors with directional subblocks rather than only a single-column subblock in the original bloom filter. Besides, we make use of multiple fingers to generate a biometric template, which is termed multi-instance biometrics. It helps to improve the performance of the method by providing more information through the use of multiple fingers. The proposed method is tested on three public datasets and achieves an equal error rate (EER) as low as 5.28% in the stolen or constant key scenario. Analysis shows that the proposed method meets the four properties of biometric template protection. Doi: 10.28991/HIJ-2023-04-02-013 Full Text: PD

    PEDESTRIAN SEGMENTATION FROM COMPLEX BACKGROUND BASED ON PREDEFINED POSE FIELDS AND PROBABILISTIC RELAXATION

    Get PDF
    The wide use of cameras enables the availability of a large amount of image frames that can be used for people counting or to monitor crowds or single individuals for security purposes. These applications require both, object detection and tracking. This task has shown to be challenging due to problems such as occlusion, deformation, motion blur, and scale variation. One alternative to perform tracking is based on the comparison of features extracted for the individual objects from the image. For this purpose, it is necessary to identify the object of interest, a human image, from the rest of the scene. This paper introduces a method to perform the separation of human bodies from images with changing backgrounds. The method is based on image segmentation, the analysis of the possible pose, and a final refinement step based on probabilistic relaxation. It is the first work we are aware that probabilistic fields computed from human pose figures are combined with an improvement step of relaxation for pedestrian segmentation. The proposed method is evaluated using different image series and the results show that it can work efficiently, but it is dependent on some parameters to be set according to the image contrast and scale. Tests show accuracies above 71%. The method performs well in other datasets, where it achieves results comparable to stateof-the-art approaches

    Plant Seed Identification

    Get PDF
    Plant seed identification is routinely performed for seed certification in seed trade, phytosanitary certification for the import and export of agricultural commodities, and regulatory monitoring, surveillance, and enforcement. Current identification is performed manually by seed analysts with limited aiding tools. Extensive expertise and time is required, especially for small, morphologically similar seeds. Computers are, however, especially good at recognizing subtle differences that humans find difficult to perceive. In this thesis, a 2D, image-based computer-assisted approach is proposed. The size of plant seeds is extremely small compared with daily objects. The microscopic images of plant seeds are usually degraded by defocus blur due to the high magnification of the imaging equipment. It is necessary and beneficial to differentiate the in-focus and blurred regions given that only sharp regions carry distinctive information usually for identification. If the object of interest, the plant seed in this case, is in- focus under a single image frame, the amount of defocus blur can be employed as a cue to separate the object and the cluttered background. If the defocus blur is too strong to obscure the object itself, sharp regions of multiple image frames acquired at different focal distance can be merged together to make an all-in-focus image. This thesis describes a novel non-reference sharpness metric which exploits the distribution difference of uniform LBP patterns in blurred and non-blurred image regions. It runs in realtime on a single core cpu and responses much better on low contrast sharp regions than the competitor metrics. Its benefits are shown both in defocus segmentation and focal stacking. With the obtained all-in-focus seed image, a scale-wise pooling method is proposed to construct its feature representation. Since the imaging settings in lab testing are well constrained, the seed objects in the acquired image can be assumed to have measureable scale and controllable scale variance. The proposed method utilizes real pixel scale information and allows for accurate comparison of seeds across scales. By cross-validation on our high quality seed image dataset, better identification rate (95%) was achieved compared with pre- trained convolutional-neural-network-based models (93.6%). It offers an alternative method for image based identification with all-in-focus object images of limited scale variance. The very first digital seed identification tool of its kind was built and deployed for test in the seed laboratory of Canadian food inspection agency (CFIA). The proposed focal stacking algorithm was employed to create all-in-focus images, whereas scale-wise pooling feature representation was used as the image signature. Throughput, workload, and identification rate were evaluated and seed analysts reported significantly lower mental demand (p = 0.00245) when using the provided tool compared with manual identification. Although the identification rate in practical test is only around 50%, I have demonstrated common mistakes that have been made in the imaging process and possible ways to deploy the tool to improve the recognition rate

    Transferrable learning from synthetic data: novel texture synthesis using Domain Randomization for visual scene understanding

    Get PDF
    Modern supervised deep learning-based approaches typically rely on vast quantities of annotated data for training computer vision and robotics tasks. A key challenge is acquiring data that encompasses the diversity encountered in the real world. The use of synthetic or computer-generated data for solving these tasks has recently garnered attention for several reasons. The first being the efficiency of producing large amounts of annotated data at a fraction of the time required in reality, addressing the time expense of manually annotated data. The second addresses the inaccuracies and mistakes arising from the laborious task of manual annotations. Thirdly, it addresses the need for vast amounts of data typically required by data-driven state-of-the-art computer vision and robotics systems. Due to domain shift, models trained on synthetic data typically underperform those trained on real-world data when deployed in the real world. Domain Randomization is a data generation approach for the synthesis of artificial data. The Domain Randomization process can generate diverse synthetic images by randomizing rendering parameters in a simulator, such as the objects, their visual appearance, the lighting, and where they appear in the picture. This synthetic data can be used to train systems capable of performing well in reality. However, it is unclear how to best approach selecting Domain Randomization parameters such as the types of textures, object poses, or types of backgrounds. Furthermore, it is unclear how Domain Randomization generalizes across various vision tasks or whether there are potential improvements to the technique. This thesis explores novel Domain Randomization techniques to solve object localization, detection, and semantic segmentation in cluttered and occluded real-world scenarios. In particular, the four main contributions of this dissertation are: (i) The first contribution of the thesis proposes a novel method for quantifying the differences between Domain Randomized and realistic data distributions using a small number of samples. The approach ranks all commonly applied Domain Randomization texture techniques in the existing literature and finds that the ranking is reflected in the task-based performance of an object localization task. (ii) The second contribution of this work introduces the SRDR dataset - a large domain randomized dataset containing 291K frames of household objects widely used in robotics andvision benchmarking [23]. SRDR builds on the YCB-M [67] dataset by generating syntheticversions for images in YCB-M using a variety of domain randomized texture types and in 5 unique environments with varying scene complexity. The SRDR dataset is highly beneficial in cross-domain training, evaluation, and comparison investigations. (iii) The third contribution presents a study evaluating Domain Randomization’s generalizabilityand robustness in sim-to-real in complex scenes for object detection and semantic segmentation. We find that the performance ranking is largely similar across the two tasks when evaluating models trained on Domain Randomized synthetic data and evaluating on real-world data, indicating Domain Randomization performs similarly across multiple tasks. (iv) Finally, we present a fast, easy to execute, novel approach for conditionally generating domain randomized textures. The textures are generated by randomly sampling patches from real-world images to apply to objects of interest. This approach outperforms the most commonly used Domain Randomization texture method from 13.157 AP to 21.287 AP and 8.950 AP to 19.481 AP in object detection and semantic segmentation tasks. The technique eliminates manually defining texture distributions to sample Domain Randomized textures. We propose a further improvement to address low texture diversity when using a small number of real-world images. We propose to use a conditional GAN-based texture generator trained on a few real-world image patches to increase the texture diversity and outperform the most commonly applied Domain Randomization texture method from 13.157 AP to 20.287 AP and 8.950 AP to 17.636 AP in object detection and semantic segmentation tasks

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Textures, Patterns and Surfaces in Color Films

    Full text link

    Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments

    Get PDF
    Traditionally, recognition systems were only based on human hard biometrics. However, the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from far distances, without people attendance in the acquisition process. Highresolution face closeshots are rarely available at far distances such that facebased systems cannot provide reliable results in surveillance applications. Human soft biometrics such as body and clothing attributes are believed to be more effective in analyzing human data collected by security cameras. This thesis contributes to the human soft biometric analysis in uncontrolled environments and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification (reid). We first review the literature of both tasks and highlight the history of advancements, recent developments, and the existing benchmarks. PAR and person reid difficulties are due to significant distances between intraclass samples, which originate from variations in several factors such as body pose, illumination, background, occlusion, and data resolution. Recent stateoftheart approaches present endtoend models that can extract discriminative and comprehensive feature representations from people. The correlation between different regions of the body and dealing with limited learning data is also the objective of many recent works. Moreover, class imbalance and correlation between human attributes are specific challenges associated with the PAR problem. We collect a large surveillance dataset to train a novel gender recognition model suitable for uncontrolled environments. We propose a deep residual network that extracts several posewise patches from samples and obtains a comprehensive feature representation. In the next step, we develop a model for multiple attribute recognition at once. Considering the correlation between human semantic attributes and class imbalance, we respectively use a multitask model and a weighted loss function. We also propose a multiplication layer on top of the backbone features extraction layers to exclude the background features from the final representation of samples and draw the attention of the model to the foreground area. We address the problem of person reid by implicitly defining the receptive fields of deep learning classification frameworks. The receptive fields of deep learning models determine the most significant regions of the input data for providing correct decisions. Therefore, we synthesize a set of learning data in which the destructive regions (e.g., background) in each pair of instances are interchanged. A segmentation module determines destructive and useful regions in each sample, and the label of synthesized instances are inherited from the sample that shared the useful regions in the synthesized image. The synthesized learning data are then used in the learning phase and help the model rapidly learn that the identity and background regions are not correlated. Meanwhile, the proposed solution could be seen as a data augmentation approach that fully preserves the label information and is compatible with other data augmentation techniques. When reid methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most importance in the final feature representation. Clothbased representations are not reliable in the longterm reid settings as people may change their clothes. Therefore, developing solutions that ignore clothing cues and focus on identityrelevant features are in demand. We transform the original data such that the identityrelevant information of people (e.g., face and body shape) are removed, while the identityunrelated cues (i.e., color and texture of clothes) remain unchanged. A learned model on the synthesized dataset predicts the identityunrelated cues (shortterm features). Therefore, we train a second model coupled with the first model and learns the embeddings of the original data such that the similarity between the embeddings of the original and synthesized data is minimized. This way, the second model predicts based on the identityrelated (longterm) representation of people. To evaluate the performance of the proposed models, we use PAR and person reid datasets, namely BIODI, PETA, RAP, Market1501, MSMTV2, PRCC, LTCC, and MIT and compared our experimental results with stateoftheart methods in the field. In conclusion, the data collected from surveillance cameras have low resolution, such that the extraction of hard biometric features is not possible, and facebased approaches produce poor results. In contrast, soft biometrics are robust to variations in data quality. So, we propose approaches both for PAR and person reid to learn discriminative features from each instance and evaluate our proposed solutions on several publicly available benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session

    Features for matching people in different views

    No full text
    There have been significant advances in the computer vision field during the last decade. During this period, many methods have been developed that have been successful in solving challenging problems including Face Detection, Object Recognition and 3D Scene Reconstruction. The solutions developed by computer vision researchers have been widely adopted and used in many real-life applications such as those faced in the medical and security industry. Among the different branches of computer vision, Object Recognition has been an area that has advanced rapidly in recent years. The successful introduction of approaches such as feature extraction and description has been an important factor in the growth of this area. In recent years, researchers have attempted to use these approaches and apply them to other problems such as Content Based Image Retrieval and Tracking. In this work, we present a novel system that finds correspondences between people seen in different images. Unlike other approaches that rely on a video stream to track the movement of people between images, here we present a feature-based approach where we locate a target’s new location in an image, based only on its visual appearance. Our proposed system comprises three steps. In the first step, a set of features is extracted from the target’s appearance. A novel algorithm is developed that allows extraction of features from a target that is particularly suitable to the modelling task. In the second step, each feature is characterised using a combined colour and texture descriptor. Inclusion of information relating to both colour and texture of a feature add to the descriptor’s distinctiveness. Finally, the target’s appearance and pose is modelled as a collection of such features and descriptors. This collection is then used as a template that allows us to search for a similar combination of features in other images that correspond to the target’s new location. We have demonstrated the effectiveness of our system in locating a target’s new position in an image, despite differences in viewpoint, scale or elapsed time between the images. The characterisation of a target as a collection of features also allows our system to robustly deal with the partial occlusion of the target
    corecore