1,307 research outputs found

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    Rice seed image classification based on HOG descriptor with missing values imputation

    Get PDF
    Rice is a primary source of food consumed by almost half of world population. Rice quality mainly depends on the purity of the rice seed. In order to ensure the purity of rice variety, the recognition process is an essential stage. In this paper, we firstly propose to use histogram of oriented gradient (HOG) descriptor to characterize rice seed images. Since the size of image is totally random and the features extracted by HOG can not be used directly by classifier due to the different dimensions. We apply several imputation methods to fill the missing data for HOG descriptor. The experiment is applied on the VNRICE benchmark dataset to evaluate the proposed approach

    A Review of Object Visual Detection for Intelligent Vehicles

    Get PDF
    This paper contains the details of different object detection (OD) techniques, object iden-tification's relationship with video investigation, and picture understanding, it has pulled in much exploration consideration as of late. Customary item identification strat-egies are based on high-quality highlights and shallow teachable models. This survey paper presents one such strategy which is named as Optical Flow method (OFM). This strategy is discovered to be stronger and more effective for moving item recognition and the equivalent has been appeared by an investigation in this review paper. Applying optical stream to a picture gives stream vectors of the focuses comparing to the moving items. Next piece of denoting the necessary moving object of interest checks to the post-preparing. Post handling is the real commitment of the review paper for moving item identification issues. Their presentation effectively deteriorates by developing com-plex troupes which join numerous low-level picture highlights with significant level set-ting from object indicators and scene classifiers. With the fast advancement in profound learning, all the more useful assets, which can learn semantic, significant level, further highlights, are acquainted with address the issues existing in customary designs. These models carry on contrastingly in network design, preparing system, and advancement work, and so on in this review paper, we give an audit on profound learning-based item location systems. Our survey starts with a short presentation on the historical backdrop of profound learning and its agent device, in particular, Convolutional Neural Network (CNN) and region-based convolutional neural networks (R-CNN)

    Methods for Detecting Floodwater on Roadways from Ground Level Images

    Get PDF
    Recent research and statistics show that the frequency of flooding in the world has been increasing and impacting flood-prone communities severely. This natural disaster causes significant damages to human life and properties, inundates roads, overwhelms drainage systems, and disrupts essential services and economic activities. The focus of this dissertation is to use machine learning methods to automatically detect floodwater in images from ground level in support of the frequently impacted communities. The ground level images can be retrieved from multiple sources, including the ones that are taken by mobile phone cameras as communities record the state of their flooded streets. The model developed in this research processes these images in multiple levels. The first detection model investigates the presence of flood in images by developing and comparing image classifiers with various feature extractors. Local Binary Patterns (LBP), Histogram of Oriented Gradients (HOG), and pretrained convolutional neural networks are used as feature extractors. Then, decision trees, logistic regression, and K-Nearest Neighbors (K-NN) models are trained and tested for making predictions on floodwater presence in the image. Once the model detects flood in an image, it moves to the second layer to detect the presence of floodwater at a pixel level in each image. This pixel-level identification is achieved by semantic segmentation by using a super-pixel based prediction method and Fully Convolutional Neural Networks (FCNs). First, SLIC super-pixel method is used to create the super-pixels, then the same types of classifiers as the initial classification method are trained to predict the class of each super-pixel. Later, the FCN is trained end-to-end without any additional classifiers. Once these processes are done, images are segmented into regions of floodwater at pixel level. In both of the classification and semantic segmentation tasks, deep learning-based methods showed the best results. Once the model receives the confirmation of flood detection in image and pixel layers, it moves to the final task of finding the floodwater depth in images. This third and final layer of the model is critical as it can help officials deduce the severity of the flood at a given area. In order to detect the depth of the water and the severity of the flooding, the model processes the cars on streets that are in water and calculates the percentage of tires that are under water. This calculation is achieved with a mixture of deep learning and classical computer vision techniques. There are four main processes in this task: (i)-Semantic segmentation of the image into pixels that belong to background, floodwater, and wheels of vehicles. The segmentation is done by multiple FCN models that are trained with various base models. (ii)-Object detection models for detecting tires. The tires are identified by a You Only Look Once (YOLO) object detector. (iii)- Improvement of initial segmentation results. A U-Net like semantic segmentation network is proposed. It uses the tire patches from the object detector and the corresponding initial segmentation results, and it learns to fix the errors of the initial segmentation results. (iv)-Calculation of water depth as a ratio of the tire wheel under the water. This final task uses the improved segmentation results to identify the ellipses that correspond to the wheel parts of vehicles and utilizes two approaches listed below as part of a hybrid method: (i)-Using the improved segmentation results as they return the pixels belonging to the wheels. Boundaries of the wheels are found from this and used. (ii)-Finding arcs that belong to elliptical objects by applying a series of image processing methods. This method connects the arcs found to build larger structures such as two-piece (half ellipse), three-piece or four-piece (full) ellipses. Once the ellipse boundary is calculated using both methods, the ratio of the ellipse under floodwater can be calculated. This novel multi-model system allows us to attribute potential prediction errors to the different parts of the model such as semantic segmentation of the image or the calculation of the elliptical boundary. To verify the applicability of the proposed methods and to train the models, extensive hand-labeled datasets were created as part of this dissertation. The initial images were collected from the web, then the datasets were enriched by images created from virtual environments, simulations of neighborhoods under flood, using the Unity software. In conclusion, the proposed methods in this dissertation, as validated on the labeled datasets, can successfully classify images as a flood scene, semantically segment the regions of flood, and predict the depth of water to indicate severit

    Arabic cursive text recognition from natural scene images

    Full text link
    © 2019 by the authors. This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years' publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers

    Development of Semantic Scene Conversion Model for Image-based Localization at Night

    Get PDF
    Developing an autonomous vehicle navigation system invariant to illumination change is one of the biggest challenges in vision-based localization field due to the fact that the appearance of an image becomes inconsistent under different light conditions even with the same location. In particular, the night scene images have greatest change in appearance compared to the according day scenes. Moreover, the night images do not have enough information in Image-based localization. To deal with illumination change, image conversion methods have been researched. However, these methods could lose the detail of objects and add fake objects into the output images. In this thesis, we proposed the semantic objects conversion model using the change of local semantic objects by categories at night. This enables the proposed model to obtain the detail of local semantic objects in image conversion. As a result, it is expected that the proposed model has a better result in image-based localization. Our model uses local semantic objects (i.e., traffic signs and street lamps) as categories. The model is composed of two phases as (1) instance segmentation and (2) semantic objects conversion. Instance segmentation is utilized as a detector for local semantic objects. In translation phase, the detected local semantic objects are translated from the appearance of the night image to day image. In evaluation, we prove that models using a set of paired images show higher accuracy compared to the models using a set of unpaired images. Our proposed method will be compared with pix2pix and ToDayGAN. Moreover, the result quantitatively evaluates the best matching score with a query image and the converted images using ORB matching descriptor
    corecore