Search CORE

232 research outputs found

Low-Rank Hypergraph Hashing for Large-Scale Remote Sensing Image Retrieval

Author: Kong Jie
Lloret Jaime
Mukherjee Mithun
Sun Quansen
Publication venue: 'MDPI AG'
Publication date: 04/04/2020
Field of study

[EN] As remote sensing (RS) images increase dramatically, the demand for remote sensing image retrieval (RSIR) is growing, and has received more and more attention. The characteristics of RS images, e.g., large volume, diversity and high complexity, make RSIR more challenging in terms of speed and accuracy. To reduce the retrieval complexity of RSIR, a hashing technique has been widely used for RSIR, mapping high-dimensional data into a low-dimensional Hamming space while preserving the similarity structure of data. In order to improve hashing performance, we propose a new hash learning method, named low-rank hypergraph hashing (LHH), to accomplish for the large-scale RSIR task. First, LHH employs a l(2-1) norm to constrain the projection matrix to reduce the noise and redundancy among features. In addition, low-rankness is also imposed on the projection matrix to exploit its global structure. Second, LHH uses hypergraphs to capture the high-order relationship among data, and is very suitable to explore the complex structure of RS images. Finally, an iterative algorithm is developed to generate high-quality hash codes and efficiently solve the proposed optimization problem with a theoretical convergence guarantee. Extensive experiments are conducted on three RS image datasets and one natural image dataset that are publicly available. The experimental results demonstrate that the proposed LHH outperforms the existing hashing learning in RSIR tasks.This research was supported in part by the Natural Science Foundation of China under Grant 61673220.Kong, J.; Sun, Q.; Mukherjee, M.; Lloret, J. (2020). Low-Rank Hypergraph Hashing for Large-Scale Remote Sensing Image Retrieval. Remote Sensing. 12(7):1-19. https://doi.org/10.3390/rs1207116411912

RiuNet

Recommended from our members

Fast embedding for image classification & retrieval and its application to the hostel industry

Author: Ammatmanee Chanattra
Publication venue: Brunel University London
Publication date: 01/01/2022
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking an unseen image input and extracting its features representing the input image. Then, for the classification task, this mathematically measured input is categorized according to established criteria in the server and consequently shows the output as a result. On the other hand, for the retrieval task, the extracted features of an unseen query image are sent to the server to search for the most visually similar images to a given image and retrieve these images as a result. Despite image features could be represented by classical features, artificial intelligence-based features, Convolutional Neural Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless, the high dimensional CNN features have been a challenge in particular for applications on mobile or Internet of Things devices. Therefore, in this thesis, several fast embeddings are explored and proposed to overcome the constraints of low memory, bandwidth, and power. Furthermore, the first hostel image database is created with three datasets, hostel image dataset containing 13,908 interior and exterior images of hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing 972 images and 2,380 images, respectively, of 20 London hostel buildings. The results demonstrate that the proposed fast embeddings such as the application of GHM-Rand operator, GHM-Fix operator, and binary feature vectors are able to outperform or give competitive results to those state-of-the-art methods with a lot less computational resource. Additionally, the findings from a ten-year literature review of CBIR study in the tourism industry could picturize the relevant research activities in the past decade which are not only beneficial to the hostel industry or tourism sector but also to the computer science and engineering research communities for the potential real-life applications of the existing and developing technologies in the field

Brunel University Research Archive

A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends

Author: Bi Ying
Bing Xue
Cagnoni Stefano
Mesejo Santiago Pablo
Zhang Mengjie
Publication venue: Journals & Magazines
Publication date: 14/09/2022
Field of study

Computer vision (CV) is a big and important field in artificial intelligence covering a wide range of applications. Image analysis is a major task in CV aiming to extract, analyse and understand the visual content of images. However, imagerelated tasks are very challenging due to many factors, e.g., high variations across images, high dimensionality, domain expertise requirement, and image distortions. Evolutionary computation (EC) approaches have been widely used for image analysis with significant achievement. However, there is no comprehensive survey of existing EC approaches to image analysis. To fill this gap, this paper provides a comprehensive survey covering all essential EC approaches to important image analysis tasks including edge detection, image segmentation, image feature analysis, image classification, object detection, and others. This survey aims to provide a better understanding of evolutionary computer vision (ECV) by discussing the contributions of different approaches and exploring how and why EC is used for CV and image analysis. The applications, challenges, issues, and trends associated to this research field are also discussed and summarised to provide further guidelines and opportunities for future research

Repositorio Institucional Universidad de Granada

Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions

Author: Xiang Xiang
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 07/03/2019
Field of study

This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation

JScholarship

Machine Learning in Sensors and Imaging

Author
Publication venue: 'MDPI AG'
Publication date: 06/05/2022
Field of study

Machine learning is extending its applications in various fields, such as image processing, the Internet of Things, user interface, big data, manufacturing, management, etc. As data are required to build machine learning networks, sensors are one of the most important technologies. In addition, machine learning networks can contribute to the improvement in sensor performance and the creation of new sensor applications. This Special Issue addresses all types of machine learning applications related to sensors and imaging. It covers computer vision-based control, activity recognition, fuzzy label classification, failure classification, motor temperature estimation, the camera calibration of intelligent vehicles, error detection, color prior model, compressive sensing, wildfire risk assessment, shelf auditing, forest-growing stem volume estimation, road management, image denoising, and touchscreens

Directory of Open Access Books (DOAB)

Individual and combined AI models for complicated predictive forest type mapping using multisource GIS data

Author: Huang Zhi
Publication venue
Publication date: 21/09/2018
Field of study

The Australian National University

FusionPlanner: A Multi-task Motion Planner for Mining Trucks using Multi-sensor Fusion Method

Author: Ai Yunfeng
Chen Long
Hu Xuemin
Li Lingxi
Li Luxi
Li Yuchen
Teng Siyu
Publication venue
Publication date: 14/08/2023
Field of study

In recent years, significant achievements have been made in motion planning for intelligent vehicles. However, as a typical unstructured environment, open-pit mining attracts limited attention due to its complex operational conditions and adverse environmental factors. A comprehensive paradigm for unmanned transportation in open-pit mines is proposed in this research, including a simulation platform, a testing benchmark, and a trustworthy and robust motion planner. \textcolor{red}{Firstly, we propose a multi-task motion planning algorithm, called FusionPlanner, for autonomous mining trucks by the Multi-sensor fusion method to adapt both lateral and longitudinal control tasks for unmanned transportation. Then, we develop a novel benchmark called MiningNav, which offers three validation approaches to evaluate the trustworthiness and robustness of well-trained algorithms in transportation roads of open-pit mines. Finally, we introduce the Parallel Mining Simulator (PMS), a new high-fidelity simulator specifically designed for open-pit mining scenarios. PMS enables the users to manage and control open-pit mine transportation from both the single-truck control and multi-truck scheduling perspectives.} \textcolor{red}{The performance of FusionPlanner is tested by MiningNav in PMS, and the empirical results demonstrate a significant reduction in the number of collisions and takeovers of our planner. We anticipate our unmanned transportation paradigm will bring mining trucks one step closer to trustworthiness and robustness in continuous round-the-clock unmanned transportation.Comment: 2Pages, 10 figure

arXiv.org e-Print Archive

Image-based Authentication

Author: Azimpourkivi Mozhgan
Publication venue: FIU Digital Commons
Publication date: 01/01/2019
Field of study

Mobile and wearable devices are popular platforms for accessing online services. However, the small form factor of such devices, makes a secure and practical experience for user authentication, challenging. Further, online fraud that includes phishing attacks, has revealed the importance of conversely providing solutions for usable authentication of remote services to online users. In this thesis, we introduce image-based solutions for mutual authentication between a user and a remote service provider. First, we propose and develop Pixie, a two-factor, object-based authentication solution for camera-equipped mobile and wearable devices. We further design ai.lock, a system that reliably extracts from images, authentication credentials similar to biometrics. Second, we introduce CEAL, a system to generate visual key fingerprint representations of arbitrary binary strings, to be used to visually authenticate online entities and their cryptographic keys. CEAL leverages deep learning to capture the target style and domain of training images, into a generator model from a large collection of sample images rather than hand curated as a collection of rules, hence provides a unique capacity for easy customizability. CEAL integrates a model of the visual discriminative ability of human perception, hence the resulting fingerprint image generator avoids mapping distinct keys to images which are not distinguishable by humans. Further, CEAL deterministically generates visually pleasing fingerprint images from an input vector where the vector components are designated to represent visual properties which are either readily perceptible to human eye, or imperceptible yet are necessary for accurately modeling the target image domain. We show that image-based authentication using Pixie is usable and fast, while ai.lock extracts authentication credentials that exceed the entropy of biometrics. Further, we show that CEAL outperforms state-of-the-art solution in terms of efficiency, usability, and resilience to powerful adversarial attacks

DigitalCommons@Florida International University

Learning Pose Invariant and Covariant Classifiers from Image Sequences

Author: Özuysal Mustafa
Publication venue: Lausanne, EPFL
Publication date: 06/05/2010
Field of study

Object tracking and detection over a wide range of viewpoints is a long-standing problem in Computer Vision. Despite significant advance in wide-baseline sparse interest point matching and development of robust dense feature models, it remains a largely open problem. Moreover, abundance of low cost mobile platforms and novel application areas, such as real-time Augmented Reality, constantly push the performance limits of existing methods. There is a need to modify and adapt these to meet more stringent speed and capacity requirements. In this thesis, we aim to overcome the difficulties due to the multi-view nature of the object detection task. We significantly improve upon existing statistical keypoint matching algorithms to perform fast and robust recognition of image patches independently of object pose. We demonstrate this on various 2D and 3D datasets. The statistical keypoint matching approaches require massive amounts of training data covering a wide range of viewpoints. We have developed a weakly supervised algorithm to greatly simplify their training for 3D objects. We also integrate this algorithm in a 3D tracking-by-detection system to perform real-time Augmented Reality. Finally, we extend the use of a large training set with smooth viewpoint variation to category-level object detection. We introduce a new dataset with continuous pose annotations which we use to train pose estimators for objects of a single category. By using these estimators' output to select pose specific classifiers, our framework can simultaneously localize objects in an image and recover their pose. These decoupled pose estimation and classification steps yield improved detection rates. Overall, we rely on image and video sequences to train classifiers that can either operate independently of the object pose or recover the pose parameters explicitly. We show that in both cases our approaches mitigate the effects of viewpoint changes and improve the recognition performance

Infoscience - École polytechnique fédérale de Lausanne