639 research outputs found

    MDLatLRR: A novel decomposition method for infrared and visible image fusion

    Get PDF
    Image decomposition is crucial for many image processing tasks, as it allows to extract salient features from source images. A good image decomposition method could lead to a better performance, especially in image fusion tasks. We propose a multi-level image decomposition method based on latent low-rank representation(LatLRR), which is called MDLatLRR. This decomposition method is applicable to many image processing fields. In this paper, we focus on the image fusion task. We develop a novel image fusion framework based on MDLatLRR, which is used to decompose source images into detail parts(salient features) and base parts. A nuclear-norm based fusion strategy is used to fuse the detail parts, and the base parts are fused by an averaging strategy. Compared with other state-of-the-art fusion methods, the proposed algorithm exhibits better fusion performance in both subjective and objective evaluation.Comment: IEEE Trans. Image Processing 2020, 14 pages, 17 figures, 3 table

    Deep Learning Based Multi-Modal Fusion Architectures for Maritime Vessel Detection

    Get PDF
    Object detection is a fundamental computer vision task for many real-world applications. In the maritime environment, this task is challenging due to varying light, view distances, weather conditions, and sea waves. In addition, light reflection, camera motion and illumination changes may cause to false detections. To address this challenge, we present three fusion architectures to fuse two imaging modalities: visible and infrared. These architectures can provide complementary information from two modalities in different levels: pixel-level, feature-level, and decision-level. They employed deep learning for performing fusion and detection. We investigate the performance of the proposed architectures conducting a real marine image dataset, which is captured by color and infrared cameras on-board a vessel in the Finnish archipelago. The cameras are employed for developing autonomous ships, and collect data in a range of operation and climatic conditions. Experiments show that feature-level fusion architecture outperforms the state-of-the-art other fusion level architectures

    Just-in-time Pastureland Trait Estimation for Silage Optimization, under Limited Data Constraints

    Get PDF
    To ensure that pasture-based farming meets production and environmental targets for a growing population under increasing resource constraints, producers need to know pastureland traits. Current proximal pastureland trait prediction methods largely rely on vegetation indices to determine biomass and moisture content. The development of new techniques relies on the challenging task of collecting labelled pastureland data, leading to small datasets. Classical computer vision has already been applied to weed identification and recognition of fruit blemishes using morphological features, but machine learning algorithms can parameterise models without the provision of explicit features, and deep learning can extract even more abstract knowledge although typically this is assumed to be based around very large datasets. This work hypothesises that through the advantages of state-of-the-art deep learning systems, pastureland crop traits can be accurately assessed in a just-in-time fashion, based on data retrieved from an inexpensive sensor platform, under the constraint of limited amounts of labelled data. However the challenges to achieve this overall goal are great, and for applications such as just-in-time yield and moisture estimation for farm-machinery, this work must bring together systems development, knowledge of good pastureland practice, and also techniques for handling low-volume datasets in a machine learning context. Given these challenges, this thesis makes a number of contributions. The first of these is a comprehensive literature review, relating pastureland traits to ruminant nutrient requirements and exploring trait estimation methods, from contact to remote sensing methods, including details of vegetation indices and the sensors and techniques required to use them. The second major contribution is a high-level specification of a platform for collecting and labelling pastureland data. This includes the collection of four-channel Blue, Green, Red and NIR (VISNIR) images, narrowband data, height and temperature differential, using inexpensive proximal sensors and provides a basis for holistic data analysis. Physical data platforms built around this specification were created to collect and label pastureland data, involving computer scientists, agricultural, mechanical and electronic engineers, and biologists from academia and industry, working with farmers. Using the developed platform and a set of protocols for data collection, a further contribution of this work was the collection of a multi-sensor multimodal dataset for pastureland properties. This was made up of four-channel image data, height data, thermal data, Global Positioning System (GPS) and hyperspectral data, and is available and labelled with biomass (Kg/Ha) and percentage dry matter, ready for use in deep learning. However, the most notable contribution of this work was a systematic investigation of various machine learning methods applied to the collected data in order to maximise model performance under the constraints indicated above. The initial set of models focused on collected hyperspectral datasets. However, due to their relative complexity in real-time deployment, the focus was instead on models that could best leverage image data. The main body of these models centred on image processing methods and, in particular, the use of the so-called Inception Resnet and MobileNet models to predict fresh biomass and percentage dry matter, enhancing performance using data fusion, transfer learning and multi-task learning. Images were subdivided to augment the dataset, using two different patch sizes, resulting in around 10,000 small patches of size 156 x 156 pixels and around 5,000 large patches of size 240 x 240 pixels. Five-fold cross validation was used in all analysis. Prediction accuracy was compared to older mechanisms, albeit using hyperspectral data collected, with no provision made for lighting, humidity or temperature. Hyperspectral labelled data did not produce accurate results when used to calculate Normalized Difference Vegetation Index (NDVI), or to train a neural network (NN), a 1D Convolutional Neural Network (CNN) or Long Short Term Memory (LSTM) models. Potential reasons for this are discussed, including issues around the use of highly sensitive devices in uncontrolled environments. The most accurate prediction came from a multi-modal hybrid model that concatenated output from an Inception ResNet based model, run on RGB data with ImageNet pre-trained RGB weights, output from a residual network trained on NIR data, and LiDAR height data, before fully connected layers, using the small patch dataset with a minimum validation MAPE of 28.23% for fresh biomass and 11.43% for dryness. However, a very similar prediction accuracy resulted from a model that omitted NIR data, thus requiring fewer sensors and training resources, making it more sustainable. Although NIR and temperature differential data were collected and used for analysis, neither improved prediction accuracy, with the Inception ResNet model’s minimum validation MAPE rising to 39.42% when NIR data was added. When both NIR data and temperature differential were added to a multi-task learning Inception ResNet model, it yielded a minimum validation MAPE of 33.32%. As more labelled data are collected, the models can be further trained, enabling sensors on mowers to collect data and give timely trait information to farmers. This technology is also transferable to other crops. Overall, this work should provide a valuable contribution to the smart agriculture research space

    Multimodal Data Augmentation for Visual-Infrared Person ReID with Corrupted Data

    Full text link
    The re-identification (ReID) of individuals over a complex network of cameras is a challenging task, especially under real-world surveillance conditions. Several deep learning models have been proposed for visible-infrared (V-I) person ReID to recognize individuals from images captured using RGB and IR cameras. However, performance may decline considerably if RGB and IR images captured at test time are corrupted (e.g., noise, blur, and weather conditions). Although various data augmentation (DA) methods have been explored to improve the generalization capacity, these are not adapted for V-I person ReID. In this paper, a specialized DA strategy is proposed to address this multimodal setting. Given both the V and I modalities, this strategy allows to diminish the impact of corruption on the accuracy of deep person ReID models. Corruption may be modality-specific, and an additional modality often provides complementary information. Our multimodal DA strategy is designed specifically to encourage modality collaboration and reinforce generalization capability. For instance, punctual masking of modalities forces the model to select the informative modality. Local DA is also explored for advanced selection of features within and among modalities. The impact of training baseline fusion models for V-I person ReID using the proposed multimodal DA strategy is assessed on corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD datasets in terms of complexity and efficiency. Results indicate that using our strategy provides V-I ReID models the ability to exploit both shared and individual modality knowledge so they can outperform models trained with no or unimodal DA. GitHub code: https://github.com/art2611/ML-MDA.Comment: 8 pages of main content, 2 pages of references, 2 pages of supplementary material, 3 figures, WACV 2023 RWS workshop

    Deep Learning Based Face Detection and Recognition in MWIR and Visible Bands

    Get PDF
    In non-favorable conditions for visible imaging like extreme illumination or nighttime, there is a need to collect images in other spectra, specifically infrared. Mid-Wave infrared (3-5 microm) images can be collected without giving away the location of the sensor in varying illumination conditions. There are many algorithms for face detection, face alignment, face recognition etc. proposed in visible band till date, while the research using MWIR images is highly limited. Face detection is an important pre-processing step for face recognition, which in turn is an important biometric modality. This thesis works towards bridging the gap between MWIR and visible spectrum through three contributions. First, a dual band based deep face detection model that works well in visible and MWIR spectrum is proposed using transfer learning. Different models are trained and tested extensively using visible and MWIR images and the one model that works well for this data is determined. For this model, experiments are conducted to learn the speed/accuracy trade-off. Following this, the available MWIR dataset is extended through augmentation using traditional methods and generative adversarial networks (GANs). Traditional methods used to augment the data are brightness adjustment, contrast enhancement, applying noise to and de-noising the images. A deep learning based GAN architecture is developed and is used to generate new face identities. The generated images are added to the original dataset and the face detection model developed earlier is once again trained and tested. The third contribution is the proposal of another GAN that converts given thermal ace images into their visible counterparts. A pre-trained model is used as discriminator for this purpose and is trained to classify the images as real and fake and an identity network is used to provide further feedback to the generator. The generated visible images are used as probe images and the original visible images are used as gallery images to perform face recognition experiments using a state-of-the-art visible-to-visible face recognition algorithm
    • …
    corecore