Search CORE

391 research outputs found

Use of multiple low level features to find interesting regions

Author: Borck M.
Tan T.
West Geoff
Publication venue
Publication date: 01/01/2014
Field of study

Vehicle-based mobile mapping systems capture co-registered imagery and 3D point cloud information over hundreds of kilometres of transport corridor. Methods for extracting information from these large datasets are labour intensive and automatic methods are desired. In addition, such methods need to be easily configured by non-expert users to detect and measure many classes of objects. This paper describes a workflow to take a large number of image and depth features, use machine learning to generate an object detection system that is fast to configure and run. The output is high detection of the objects of interest but with an acceptable number of false alarms. This is desirable as the output is fed into a more complex and hence more computationally expensive analysis system to reject the false alarms and measure the remaining objects. Image and depth features from bounding boxes around objects of interest and random background are used for training with some popular learning algorithms. The interface allows a non-expert user to observe the performance and make modifications to improve the performance. Copyright © 2014 SCITEPRESS

espace@Curtin

Learning with Free Object Segments for Long-Tailed Instance Segmentation

Author: Chao Wei-Lun
Chen Tianle
Fu Wenjin
Pan Tai-Yu
Zhang Cheng
Zhong Jike
Publication venue
Publication date: 28/03/2022
Field of study

One fundamental challenge in building an instance segmentation model for a large number of classes in complex scenes is the lack of training examples, especially for rare objects. In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation. We find that an abundance of instance segments can potentially be obtained freely from object-centric images, according to two insights: (i) an object-centric image usually contains one salient object in a simple background; (ii) objects from the same class often share similar appearances or similar contrasts to the background. Motivated by these insights, we propose a simple and scalable framework FreeSeg for extracting and leveraging these "free" object foreground segments to facilitate model training in long-tailed instance segmentation. Concretely, we investigate the similarity among object-centric images of the same class to propose candidate segments of foreground instances, followed by a novel ranking of segment quality. The resulting high-quality object segments can then be used to augment the existing long-tailed datasets, e.g., by copying and pasting the segments onto the original training images. Extensive experiments show that FreeSeg yields substantial improvements on top of strong baselines and achieves state-of-the-art accuracy for segmenting rare object categories

arXiv.org e-Print Archive

Defect Detection for Patterned Fabric Images Based on GHOG and Low-Rank Decomposition

Author: Gao Guangshuai
Huang Di
Li Chunlei
Liu Zhoufeng
Xi Jiangtao
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2019
Field of study

In contrast to defect-free fabric images with macro-homogeneous textures and regular patterns, the fabric images with the defect are characterized by the defect regions that are salient and sparse among the redundant background. Therefore, as an effective tool for separating an image into a redundant part (the background) and sparse part (the defect), the low-rank decomposition model provides an ideal solution for patterned fabric defect detection. In this paper, a novel patterned method for fabric defect detection is proposed based on a novel texture descriptor and the low-rank decomposition model. First, an efficient second-order orientation-aware descriptor, denoted as GHOG, is designed by combining Gabor and histogram of oriented gradient (HOG). In addition, a spatial pooling strategy based on human vision mechanism is utilized to further improve the discrimination ability of the proposed descriptor. The proposed texture descriptor can make the defect-free image blocks lay in a low-rank subspace, while the defective image blocks have deviated from this subspace. Then, a constructed low-rank decomposition model divides the feature matrix generated from all the image blocks into a low-rank part, which represents the defect-free background, and a sparse part, which represents sparse defects. In addition, a non-convex log det as a smooth surrogate function is utilized to improve the efficiency of the constructed low-rank model. Finally, the defects are localized by segmenting the saliency map generated by the sparse matrix. The qualitative results and quantitative evaluation results demonstrate that the proposed method improves the detection accuracy and self-adaptivity comparing with the state-of-the-art methods

Research Online

Visual sentiment prediction based on automatic discovery of affective regions

Author: Cheng Ming-Ming
Rosin Paul L,
She Songyu
Sun Ming
Wang Liang
Yang Jufeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2018
Field of study

Automatic assessment of sentiment from visual content has gained considerable attention with the increasing tendency of expressing opinions via images and videos online. This paper investigates the problem of visual sentiment analysis, which involves a high-level abstraction in the recognition process. While most of the current methods focus on improving holistic representations, we aim to utilize the local information, which is inspired by the observation that both the whole image and local regions convey significant sentiment information. We propose a framework to leverage affective regions, where we first use an off-the-shelf objectness tool to generate the candidates, and employ a candidate selection method to remove redundant and noisy proposals. Then a convolutional neural network (CNN) is connected with each candidate to compute the sentiment scores, and the affective regions are automatically discovered, taking the objectness score as well as the sentiment score into consideration. Finally, the CNN outputs from local regions are aggregated with the whole images to produce the final predictions. Our framework only requires image-level labels, thereby significantly reducing the annotation burden otherwise required for training. This is especially important for sentiment analysis as sentiment can be abstract, and labeling affective regions is too subjective and labor-consuming. Extensive experiments show that the proposed algorithm outperforms the state-of-the-art approaches on eight popular benchmark datasets

Online Research @ Cardiff

Text Extraction From Natural Scene: Methodology And Application

Author: Yi Chucai
Publication venue: CUNY Academic Works
Publication date: 01/10/2014
Field of study

With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

City University of New York

Multispectral Image Road Extraction Based Upon Automated Map Conflation

Author: Chen Bin
Publication venue: RIT Scholar Works
Publication date: 30/03/2015
Field of study

Road network extraction from remotely sensed imagery enables many important and diverse applications such as vehicle tracking, drone navigation, and intelligent transportation studies. There are, however, a number of challenges to road detection from an image. Road pavement material, width, direction, and topology vary across a scene. Complete or partial occlusions caused by nearby buildings, trees, and the shadows cast by them, make maintaining road connectivity difficult. The problems posed by occlusions are exacerbated with the increasing use of oblique imagery from aerial and satellite platforms. Further, common objects such as rooftops and parking lots are made of materials similar or identical to road pavements. This problem of common materials is a classic case of a single land cover material existing for different land use scenarios. This work addresses these problems in road extraction from geo-referenced imagery by leveraging the OpenStreetMap digital road map to guide image-based road extraction. The crowd-sourced cartography has the advantages of worldwide coverage that is constantly updated. The derived road vectors follow only roads and so can serve to guide image-based road extraction with minimal confusion from occlusions and changes in road material. On the other hand, the vector road map has no information on road widths and misalignments between the vector map and the geo-referenced image are small but nonsystematic. Properly correcting misalignment between two geospatial datasets, also known as map conflation, is an essential step. A generic framework requiring minimal human intervention is described for multispectral image road extraction and automatic road map conflation. The approach relies on the road feature generation of a binary mask and a corresponding curvilinear image. A method for generating the binary road mask from the image by applying a spectral measure is presented. The spectral measure, called anisotropy-tunable distance (ATD), differs from conventional measures and is created to account for both changes of spectral direction and spectral magnitude in a unified fashion. The ATD measure is particularly suitable for differentiating urban targets such as roads and building rooftops. The curvilinear image provides estimates of the width and orientation of potential road segments. Road vectors derived from OpenStreetMap are then conflated to image road features by applying junction matching and intermediate point matching, followed by refinement with mean-shift clustering and morphological processing to produce a road mask with piecewise width estimates. The proposed approach is tested on a set of challenging, large, and diverse image data sets and the performance accuracy is assessed. The method is effective for road detection and width estimation of roads, even in challenging scenarios when extensive occlusion occurs

RIT Scholar Works

Monocular visual scene analysis:saliency detection and 3D face reconstruction using GAN

Author: Cai Xiaoxu
Publication venue
Publication date: 01/01/2021
Field of study

Portsmouth University Research Portal (Pure)

Advanced Biometrics with Deep Learning

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

Directory of Open Access Books (DOAB)

Méthodes de vision à la motion et leurs applications

Author: Wang Yi
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2017
Field of study

La détection de mouvement est une opération de base souvent utilisée en vision par ordinateur, que ce soit pour la détection de piétons, la détection d’anomalies, l’analyse de scènes vidéo ou le suivi d’objets en temps réel. Bien qu’un très grand nombre d’articles ait été publiés sur le sujet, plusieurs questions restent en suspens. Par exemple, il n’est toujours pas clair comment détecter des objets en mouvement dans des vidéos contenant des situations difficiles à gérer comme d'importants mouvements de fonds et des changements d’illumination. De plus, il n’y a pas de consensus sur comment quantifier les performances des méthodes de détection de mouvement. Aussi, il est souvent difficile d’incorporer de l’information de mouvement à des opérations de haut niveau comme par exemple la détection de piétons. Dans cette thèse, j’aborde quatre problèmes en lien avec la détection de mouvement: 1. Comment évaluer efficacement des méthodes de détection de mouvement? Pour répondre à cette question, nous avons mis sur pied une procédure d’évaluation de telles méthodes. Cela a mené à la création de la plus grosse base de données 100\% annotée au monde dédiée à la détection de mouvement et organisé une compétition internationale (CVPR 2014). J’ai également exploré différentes métriques d’évaluation ainsi que des stratégies de combinaison de méthodes de détection de mouvement. 2. L’annotation manuelle de chaque objet en mouvement dans un grand nombre de vidéos est un immense défi lors de la création d’une base de données d’analyse vidéo. Bien qu’il existe des méthodes de segmentation automatiques et semi-automatiques, ces dernières ne sont jamais assez précises pour produire des résultats de type “vérité terrain”. Pour résoudre ce problème, nous avons proposé une méthode interactive de segmentation d’objets en mouvement basée sur l’apprentissage profond. Les résultats obtenus sont aussi précis que ceux obtenus par un être humain tout en étant 40 fois plus rapide. 3. Les méthodes de détection de piétons sont très souvent utilisées en analyse de la vidéo. Malheureusement, elles souffrent parfois d’un grand nombre de faux positifs ou de faux négatifs tout dépendant de l’ajustement des paramètres de la méthode. Dans le but d’augmenter les performances des méthodes de détection de piétons, nous avons proposé un filtre non linéaire basée sur la détection de mouvement permettant de grandement réduire le nombre de faux positifs. 4. L’initialisation de fond ({\em background initialization}) est le processus par lequel on cherche à retrouver l’image de fond d’une vidéo sans les objets en mouvement. Bien qu’un grand nombre de méthodes ait été proposé, tout comme la détection de mouvement, il n’existe aucune base de donnée ni procédure d’évaluation pour de telles méthodes. Nous avons donc mis sur pied la plus grosse base de données au monde pour ce type d’applications et avons organisé une compétition internationale (ICPR 2016).Abstract : Motion detection is a basic video analytic operation on which many high-level computer vision tasks are built upon, e.g., pedestrian detection, anomaly detection, scene understanding and object tracking strategies. Even though a large number of motion detection methods have been proposed in the last decades, some important questions are still unanswered, including: (1) how to separate the foreground from the background accurately even under extremely challenging circumstances? (2) how to evaluate different motion detection methods? And (3) how to use motion information extracted by motion detection to help improving high-level computer vision tasks? In this thesis, we address four problems related to motion detection: 1. How can we benchmark (and on which videos) motion detection method? Current datasets are either too small with a limited number of scenarios, or only provide bounding box ground truth that indicates the rough location of foreground objects. As a solution, we built the largest and most objective motion detection dataset in the world with pixel accurate ground truth to evaluate and compare motion detection methods. We also explore various evaluation metrics as well as different combination strategies. 2. Providing pixel accurate ground truth is a huge challenge when building a motion detection dataset. While automatic labeling methods suffer from a too large false detection rate to be used as ground truth, manual labeling of hundreds of thousands of frames is extremely time consuming. To solve this problem, we proposed an interactive deep learning method for segmenting moving objects from videos. The proposed method can reach human-level accuracies while lowering the labeling time by a factor of 40. 3. Pedestrian detectors always suffer from either false positive detections or false negative detections all depending on the parameter tuning. Unfortunately, manual adjustment of parameters for a large number of videos is not feasible in practice. In order to make pedestrian detectors more robust on a large variety of videos, we combined motion detection with various state-of-the-art pedestrian detectors. This is done by a novel motion-based nonlinear filtering process which improves detectors by a significant margin. 4. Scene background initialization is the process by which a method tries to recover the RGB background image of a video without foreground objects in it. However, one of the reasons that background modeling is challenging is that there is no good dataset and benchmarking framework to estimate the performance of background modeling methods. To fix this problem, we proposed an extensive survey as well as a novel benchmarking framework for scene background initialization

Savoirs UdeS

Multimodal Adversarial Learning

Author: Osahor Uche
Publication venue: 'West Virginia University Libraries'
Publication date: 01/01/2022
Field of study

Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art

The Research Repository @ WVU (West Virginia University)