305 research outputs found

    Convolutional Networks for Object Category and 3D Pose Estimation from 2D Images

    Full text link
    Current CNN-based algorithms for recovering the 3D pose of an object in an image assume knowledge about both the object category and its 2D localization in the image. In this paper, we relax one of these constraints and propose to solve the task of joint object category and 3D pose estimation from an image assuming known 2D localization. We design a new architecture for this task composed of a feature network that is shared between subtasks, an object categorization network built on top of the feature network, and a collection of category dependent pose regression networks. We also introduce suitable loss functions and a training method for the new architecture. Experiments on the challenging PASCAL3D+ dataset show state-of-the-art performance in the joint categorization and pose estimation task. Moreover, our performance on the joint task is comparable to the performance of state-of-the-art methods on the simpler 3D pose estimation with known object category task

    Learning Dilation Factors for Semantic Segmentation of Street Scenes

    Full text link
    Contextual information is crucial for semantic segmentation. However, finding the optimal trade-off between keeping desired fine details and at the same time providing sufficiently large receptive fields is non trivial. This is even more so, when objects or classes present in an image significantly vary in size. Dilated convolutions have proven valuable for semantic segmentation, because they allow to increase the size of the receptive field without sacrificing image resolution. However, in current state-of-the-art methods, dilation parameters are hand-tuned and fixed. In this paper, we present an approach for learning dilation parameters adaptively per channel, consistently improving semantic segmentation results on street-scene datasets like Cityscapes and Camvid.Comment: GCPR201

    What is Holding Back Convnets for Detection?

    Full text link
    Convolutional neural networks have recently shown excellent results in general object detection and many other tasks. Albeit very effective, they involve many user-defined design choices. In this paper we want to better understand these choices by inspecting two key aspects "what did the network learn?", and "what can the network learn?". We exploit new annotations (Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite common belief, our results indicate that existing state-of-the-art convnet architectures are not invariant to various appearance factors. In fact, all considered networks have similar weak points which cannot be mitigated by simply increasing the training data (architectural changes are needed). We show that overall performance can improve when using image renderings for data augmentation. We report the best known results on the Pascal3D+ detection and view-point estimation tasks

    Deep Learning Based Vehicle Make-Model Classification

    Full text link
    This paper studies the problems of vehicle make & model classification. Some of the main challenges are reaching high classification accuracy and reducing the annotation time of the images. To address these problems, we have created a fine-grained database using online vehicle marketplaces of Turkey. A pipeline is proposed to combine an SSD (Single Shot Multibox Detector) model with a CNN (Convolutional Neural Network) model to train on the database. In the pipeline, we first detect the vehicles by following an algorithm which reduces the time for annotation. Then, we feed them into the CNN model. It is reached approximately 4% better classification accuracy result than using a conventional CNN model. Next, we propose to use the detected vehicles as ground truth bounding box (GTBB) of the images and feed them into an SSD model in another pipeline. At this stage, it is reached reasonable classification accuracy result without using perfectly shaped GTBB. Lastly, an application is implemented in a use case by using our proposed pipelines. It detects the unauthorized vehicles by comparing their license plate numbers and make & models. It is assumed that license plates are readable.Comment: 10 pages, ICANN 2018: Artificial Neural Networks and Machine Learnin

    Geostatistical merging of weather radar data with a sparse rain gauge network in Queensland

    Get PDF
    Many parts of Australia, including much of Queensland and Northern Australia, tend to have sparse rain gauge coverage. To provide rainfall information across Australia, several gridded daily rainfall datasets such as those available through the Australian Water Availability Project and Scientific Information for Land Owners services have been developed. These daily grids are produced by interpolation of rain gauge data and therefore can provide unrealistic rainfall estimates in areas that have few rain gauges. To obtain rainfall data at a higher spatial resolution, weather radars and satellites can provide coverage over a large area although their measurements come with considerable uncertainty. Various approaches have been developed to adjust radar and satellite data and statistically merge them with rain gauge measurements in interpolation schemes, the goal being to retain the information on the spatial distribution of rainfall provided by remote sensing while also taking advantage of the greater accuracy of the rain gauges, but many of these techniques have been applied primarily on shorter time scales of an hour or less. This paper applies some existing methods for geostatistical merging of radar data with sparse rain gauge networks and evaluates the performance of the approaches using the Mt Stapylton radar in Brisbane and 15 surrounding rain gauges. Summer and winter data from 01/12/2013 to 28/02/2018 are considered. The radar data is corrected for mean field bias using quantile mapping and is used to develop the variogram models for use in Kriging. The performance of Kriging the gauge data using the radar variogram is compared with conditional merging and Kriging with radar values introduced as a drift variable. Leave-one-out cross-validation is used to evaluate the performance of the methods. We find some disagreement between all radar-based approaches and the validation gauge measurements with typical daily root-mean-square errors being between 10mm and 20mm for all approaches. Some outliers with substantially higher RMSE are noted for some days in the unadjusted radar data as well as in the corrected and interpolated data. For winter data the bias-correction and interpolation steps increased the agreement between the radar data and the validation gauges, but this improvement was not observed in the summer data. In addition, due to the low number of gauges the performance of the interpolation is extremely sensitive to the rain gauge values, with certain combinations of rain gauge values and choice of validation gauge leading to extremely large cross-validation errors. The results indicate that while incorporating the radar data makes it possible to perform Kriging with few gauges ona single day's data, this is not an ideal approach for quantitative precipitation estimation and further steps should be taken to improve the radar-gauge correlation

    Comparing Phase Based Seasonal Climate Forecasting Methods for Sugarcane Growing Regions

    Get PDF
    EXTENDED ABSTRACT Climate forecasting systems that group years on the basis of a climate forecasting index like the Southern Oscillation Index (SOI) or sea surface temperatures (SSTs) are quite simple to explain to industry personnel. Phase systems identify a subset of years (analogues) that have the same phase for a particular month. Industries can then investigate how the response of interest varied historically by the SOI or SST phase and self-validate the system. This is possible because industry members will remember the big wet and big dry years. Phase systems also allow industry personnel to visualise distributional shifts in rainfall and other responses (e.g. yield) between the different phases. These components spark a great deal of interest and enthusiasm at case study meetings. The simplicity of phase systems contributes to increased understanding of the forecasting approach, and highlights both the strengths and limitations associated with seasonal climate forecasting. Given that climate forecasts are not a perfect science, it is important that industries understand the risks and probability concepts so they can better integrate forecasts into a decision-making framework. The Australian sugar industry has predominantly used the five-phase SOI climate forecasting system as its benchmark in recent years. The purpose of this paper is to compare the performance of the benchmark system with other phase-based climate forecasting systems. Three-phase and nine-phase SST forecasting systems and a three-phase SOI system formed part of the investigation. An assessment is made across the sugarcane growing regions and across the calendar year, simultaneously. This is done for seven sugar growing regions that collectively produce approximately 90% of Australia's sugar. A methodology that enables a fair comparison of the systems is presented. This methodology caters for the different number of phases with each forecasting system. We consider three performance measures: P-values of (i) the Kruskal-Wallis (KW) test statistic, (ii) a linear error in probability space (LEPS) skill score and (iii) a relative operating characteric (ROC) skill score for above and below median rainfall. P-values are used to overcome obstacles associated with the different numbers of phases. This is important since, by chance alone, it is easier to get a higher or better categorical LEPS score for systems that have more phases. Results can vary with the performance measure. If ROC-and LEPS-based performance measures were preferred, then the three-phase SST system produced a higher number of significant results across the regions and three-month rolling periods. If performance measures that reflect the degree of distributional shifts or discriminatory ability between phases are preferred, then the five-phase SOI system produced the highest number of significant fields. Taking into consideration dependencies and auto-correlations associated with the response measurements across the calendar year and across coastal regions which essentially differ in latitudinal positioning, it is important to assess the likelihood that the number of significant fields could have occurred purely by chance. Whilst a methodology for comparing different phase systems, where the number of phases varies from system to system is presented, the dilemma as to which performance measures to base decisions remains. Users must carefully consider which performance measures are most appropriate for their investigation

    Efficient On-the-fly Category Retrieval using ConvNets and GPUs

    Full text link
    We investigate the gains in precision and speed, that can be obtained by using Convolutional Networks (ConvNets) for on-the-fly retrieval - where classifiers are learnt at run time for a textual query from downloaded images, and used to rank large image or video datasets. We make three contributions: (i) we present an evaluation of state-of-the-art image representations for object category retrieval over standard benchmark datasets containing 1M+ images; (ii) we show that ConvNets can be used to obtain features which are incredibly performant, and yet much lower dimensional than previous state-of-the-art image representations, and that their dimensionality can be reduced further without loss in performance by compression using product quantization or binarization. Consequently, features with the state-of-the-art performance on large-scale datasets of millions of images can fit in the memory of even a commodity GPU card; (iii) we show that an SVM classifier can be learnt within a ConvNet framework on a GPU in parallel with downloading the new training images, allowing for a continuous refinement of the model as more images become available, and simultaneous training and ranking. The outcome is an on-the-fly system that significantly outperforms its predecessors in terms of: precision of retrieval, memory requirements, and speed, facilitating accurate on-the-fly learning and ranking in under a second on a single GPU.Comment: Published in proceedings of ACCV 201

    Deep Bilevel Learning

    Full text link
    We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent. Our approach is based on the principles of cross-validation, where a validation set is used to limit the model overfitting. We formulate such principles as a bilevel optimization problem. This formulation allows us to define the optimization of a cost on the validation set subject to another optimization on the training set. The overfitting is controlled by introducing weights on each mini-batch in the training set and by choosing their values so that they minimize the error on the validation set. In practice, these weights define mini-batch learning rates in a gradient descent update equation that favor gradients with better generalization capabilities. Because of its simplicity, this approach can be integrated with other regularization methods and training schemes. We evaluate extensively our proposed algorithm on several neural network architectures and datasets, and find that it consistently improves the generalization of the model, especially when labels are noisy.Comment: ECCV 201

    A Review of Object Detection Models based on Convolutional Neural Network

    Full text link
    Convolutional Neural Network (CNN) has become the state-of-the-art for object detection in image task. In this chapter, we have explained different state-of-the-art CNN based object detection models. We have made this review with categorization those detection models according to two different approaches: two-stage approach and one-stage approach. Through this chapter, it has shown advancements in object detection models from R-CNN to latest RefineDet. It has also discussed the model description and training details of each model. Here, we have also drawn a comparison among those models.Comment: 17 pages, 11 figures, 1 tabl

    ICNet for Real-Time Semantic Segmentation on High-Resolution Images

    Full text link
    We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. We provide in-depth analysis of our framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. Our system yields real-time inference on a single GPU card with decent quality results evaluated on challenging datasets like Cityscapes, CamVid and COCO-Stuff.Comment: ECCV 201
    • …
    corecore