1,838 research outputs found

    Crop conditional Convolutional Neural Networks for massive multi-crop plant disease classification over cell phone acquired images taken on real field conditions

    Get PDF
    Convolutional Neural Networks (CNN) have demonstrated their capabilities on the agronomical field, especially for plant visual symptoms assessment. As these models grow both in the number of training images and in the number of supported crops and diseases, there exist the dichotomy of (1) generating smaller models for specific crop or, (2) to generate a unique multi-crop model in a much more complex task (especially at early disease stages) but with the benefit of the entire multiple crop image dataset variability to enrich image feature description learning. In this work we first introduce a challenging dataset of more than one hundred-thousand images taken by cell phone in real field wild conditions. This dataset contains almost equally distributed disease stages of seventeen diseases and five crops (wheat, barley, corn, rice and rape-seed) where several diseases can be present on the same picture. When applying existing state of the art deep neural network methods to validate the two hypothesised approaches, we obtained a balanced accuracy (BAC=0.92) when generating the smaller crop specific models and a balanced accuracy (BAC=0.93) when generating a single multi-crop model. In this work, we propose three different CNN architectures that incorporate contextual non-image meta-data such as crop information onto an image based Convolutional Neural Network. This combines the advantages of simultaneously learning from the entire multi-crop dataset while reducing the complexity of the disease classification tasks. The crop-conditional plant disease classification network that incorporates the contextual information by concatenation at the embedding vector level obtains a balanced accuracy of 0.98 improving all previous methods and removing 71% of the miss-classifications of the former methods

    What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification

    Full text link
    Matching pedestrians across disjoint camera views, known as person re-identification (re-id), is a challenging problem that is of importance to visual recognition and surveillance. Most existing methods exploit local regions within spatial manipulation to perform matching in local correspondence. However, they essentially extract \emph{fixed} representations from pre-divided regions for each image and perform matching based on the extracted representation subsequently. For models in this pipeline, local finer patterns that are crucial to distinguish positive pairs from negative ones cannot be captured, and thus making them underperformed. In this paper, we propose a novel deep multiplicative integration gating function, which answers the question of \emph{what-and-where to match} for effective person re-id. To address \emph{what} to match, our deep network emphasizes common local patterns by learning joint representations in a multiplicative way. The network comprises two Convolutional Neural Networks (CNNs) to extract convolutional activations, and generates relevant descriptors for pedestrian matching. This thus, leads to flexible representations for pair-wise images. To address \emph{where} to match, we combat the spatial misalignment by performing spatially recurrent pooling via a four-directional recurrent neural network to impose spatial dependency over all positions with respect to the entire image. The proposed network is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner. To demonstrate the superiority of our method, extensive experiments are conducted over three benchmark data sets: VIPeR, CUHK03 and Market-1501.Comment: Published at Pattern Recognition, Elsevie

    Hierarchical Attention Network for Action Segmentation

    Full text link
    The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grained action segmentation. This generates a simple, lightweight, yet extremely effective architecture for segmenting continuous video streams and has multiple application domains. We evaluate our system on multiple challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets, and achieves state-of-the-art performance. The evaluated datasets encompass numerous video capture settings which are inclusive of static overhead camera views and dynamic, ego-centric head-mounted camera views, demonstrating the direct applicability of the proposed framework in a variety of settings.Comment: Published in Pattern Recognition Letter
    corecore