Search CORE

5,207 research outputs found

Fast, Multi-Scale Image Processing on a Square Spiral Framework

Author: Coleman Sonya
Fegan John
Kerr Dermot
Scotney Bryan
Publication venue
Publication date: 01/01/2018
Field of study

Crossref

Ulster University's Research Portal

A multisensor SLAM for dense maps of large scale environments under poor lighting conditions

Author: Le Cras Jared R
Publication venue: Curtin University
Publication date: 01/01/2012
Field of study

This thesis describes the development and implementation of a multisensor large scale autonomous mapping system for surveying tasks in underground mines. The hazardous nature of the underground mining industry has resulted in a push towards autonomous solutions to the most dangerous operations, including surveying tasks. Many existing autonomous mapping techniques rely on approaches to the Simultaneous Localization and Mapping (SLAM) problem which are not suited to the extreme characteristics of active underground mining environments. Our proposed multisensor system has been designed from the outset to address the unique challenges associated with underground SLAM. The robustness, self-containment and portability of the system maximize the potential applications.The multisensor mapping solution proposed as a result of this work is based on a fusion of omnidirectional bearing-only vision-based localization and 3D laser point cloud registration. By combining these two SLAM techniques it is possible to achieve some of the advantages of both approaches – the real-time attributes of vision-based SLAM and the dense, high precision maps obtained through 3D lasers. The result is a viable autonomous mapping solution suitable for application in challenging underground mining environments.A further improvement to the robustness of the proposed multisensor SLAM system is a consequence of incorporating colour information into vision-based localization. Underground mining environments are often dominated by dynamic sources of illumination which can cause inconsistent feature motion during localization. Colour information is utilized to identify and remove features resulting from illumination artefacts and to improve the monochrome based feature matching between frames.Finally, the proposed multisensor mapping system is implemented and evaluated in both above ground and underground scenarios. The resulting large scale maps contained a maximum offset error of ±30mm for mapping tasks with lengths over 100m

espace@Curtin

Galaxy Image Classification Based on Citizen Science Data: A Comparative Study

Author: Jiménez Manuel
John Robert
Torres Torres Mercedes
Triguero Isaac
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Many research fields are now faced with huge volumes of data automatically generated by specialised equipment. Astronomy is a discipline that deals with large collections of images difficult to handle by experts alone. As a consequence, astronomers have been relying on the power of the crowds, as a form of citizen science, for the classification of galaxy images by amateur people. However, the new generation of telescopes that will produce images at a higher rate highlights the limitations of this approach, and the use of machine learning methods for automatic classification is considered essential. The goal of this paper is to shed light on the automated classification of galaxy images exploring two distinct machine learning strategies. First, following the classical approach consisting of feature extraction together with a classifier, we compare the state-of-the-art feature extractor for this problem, the WND-CHARM, with our proposal based on autoencoders for feature extraction on galaxy images. We then compare these results with an end-to-end classification using convolutional neural networks. To better leverage the available citizen science data, we also investigate a pre-training scheme that exploits both amateur-and expert-labelled data. Our experiments reveal that autoencoders greatly speed up feature extraction in comparison with WND-CHARM and both classification strategies, either using convolutional neural networks or feature extraction, reach comparable accuracy. The use of pre-training in convolutional neural networks, however, has allowed us to provide even better results

Repository@Nottingham

Fast, Dense Feature SDM on an iPhone

Author: Fagg Ashton
Lucey Simon
Sridharan Sridha
Publication venue
Publication date: 15/12/2016
Field of study

In this paper, we present our method for enabling dense SDM to run at over 90 FPS on a mobile device. Our contributions are two-fold. Drawing inspiration from the FFT, we propose a Sparse Compositional Regression (SCR) framework, which enables a significant speed up over classical dense regressors. Second, we propose a binary approximation to SIFT features. Binary Approximated SIFT (BASIFT) features, which are a computationally efficient approximation to SIFT, a commonly used feature with SDM. We demonstrate the performance of our algorithm on an iPhone 7, and show that we achieve similar accuracy to SDM

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

DeepVoxels: Learning Persistent 3D Feature Embeddings

Author: Heide Felix
Nießner Matthias
Sitzmann Vincent
Thies Justus
Wetzstein Gordon
Zollhöfer Michael
Publication venue
Publication date: 01/01/2019
Field of study

In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis. To this end, we propose DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry. At its core, our approach is based on a Cartesian 3D grid of persistent embedded features that learn to make use of the underlying 3D scene structure. Our approach combines insights from 3D geometric computer vision with recent advances in learning image-to-image mappings based on adversarial loss functions. DeepVoxels is supervised, without requiring a 3D reconstruction of the scene, using a 2D re-rendering loss and enforces perspective and multi-view geometry in a principled manner. We apply our persistent 3D scene representation to the problem of novel view synthesis demonstrating high-quality results for a variety of challenging scenes.Comment: Video: https://www.youtube.com/watch?v=HM_WsZhoGXw Supplemental material: https://drive.google.com/file/d/1BnZRyNcVUty6-LxAstN83H79ktUq8Cjp/view?usp=sharing Code: https://github.com/vsitzmann/deepvoxels Project page: https://vsitzmann.github.io/deepvoxels

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Natural image processing and synthesis using deep learning

Author: Ganin Iaroslav
Publication venue
Publication date: 01/09/2019
Field of study

Nous étudions dans cette thèse comment les réseaux de neurones profonds peuvent être utilisés dans différents domaines de la vision artificielle. La vision artificielle est un domaine interdisciplinaire qui traite de la compréhension d’images et de vidéos numériques. Les problèmes de ce domaine ont traditionnellement été adressés avec des méthodes ad-hoc nécessitant beaucoup de réglages manuels. En effet, ces systèmes de vision artificiels comprenaient jusqu’à récemment une série de modules optimisés indépendamment. Cette approche est très raisonnable dans la mesure où, avec peu de données, elle bénéficient autant que possible des connaissances du chercheur. Mais cette avantage peut se révéler être une limitation si certaines données d’entré n’ont pas été considérées dans la conception de l’algorithme. Avec des volumes et une diversité de données toujours plus grands, ainsi que des capacités de calcul plus rapides et économiques, les réseaux de neurones profonds optimisés d’un bout à l’autre sont devenus une alternative attrayante. Nous démontrons leur avantage avec une série d’articles de recherche, chacun d’entre eux trouvant une solution à base de réseaux de neurones profonds à un problème d’analyse ou de synthèse visuelle particulier. Dans le premier article, nous considérons un problème de vision classique: la détection de bords et de contours. Nous partons de l’approche classique et la rendons plus ‘neurale’ en combinant deux étapes, la détection et la description de motifs visuels, en un seul réseau convolutionnel. Cette méthode, qui peut ainsi s’adapter à de nouveaux ensembles de données, s’avère être au moins aussi précis que les méthodes conventionnelles quand il s’agit de domaines qui leur sont favorables, tout en étant beaucoup plus robuste dans des domaines plus générales. Dans le deuxième article, nous construisons une nouvelle architecture pour la manipulation d’images qui utilise l’idée que la majorité des pixels produits peuvent d’être copiés de l’image d’entrée. Cette technique bénéficie de plusieurs avantages majeurs par rapport à l’approche conventionnelle en apprentissage profond. En effet, elle conserve les détails de l’image d’origine, n’introduit pas d’aberrations grâce à la capacité limitée du réseau sous-jacent et simplifie l’apprentissage. Nous démontrons l’efficacité de cette architecture dans le cadre d’une tâche de correction du regard, où notre système produit d’excellents résultats. Dans le troisième article, nous nous éclipsons de la vision artificielle pour étudier le problème plus générale de l’adaptation à de nouveaux domaines. Nous développons un nouvel algorithme d’apprentissage, qui assure l’adaptation avec un objectif auxiliaire à la tâche principale. Nous cherchons ainsi à extraire des motifs qui permettent d’accomplir la tâche mais qui ne permettent pas à un réseau dédié de reconnaître le domaine. Ce réseau est optimisé de manière simultané avec les motifs en question, et a pour tâche de reconnaître le domaine de provenance des motifs. Cette technique est simple à implémenter, et conduit pourtant à l’état de l’art sur toutes les tâches de référence. Enfin, le quatrième article présente un nouveau type de modèle génératif d’images. À l’opposé des approches conventionnels à base de réseaux de neurones convolutionnels, notre système baptisé SPIRAL décrit les images en termes de programmes bas-niveau qui sont exécutés par un logiciel de graphisme ordinaire. Entre autres, ceci permet à l’algorithme de ne pas s’attarder sur les détails de l’image, et de se concentrer plutôt sur sa structure globale. L’espace latent de notre modèle est, par construction, interprétable et permet de manipuler des images de façon prévisible. Nous montrons la capacité et l’agilité de cette approche sur plusieurs bases de données de référence.In the present thesis, we study how deep neural networks can be applied to various tasks in computer vision. Computer vision is an interdisciplinary field that deals with understanding of digital images and video. Traditionally, the problems arising in this domain were tackled using heavily hand-engineered adhoc methods. A typical computer vision system up until recently consisted of a sequence of independent modules which barely talked to each other. Such an approach is quite reasonable in the case of limited data as it takes major advantage of the researcher's domain expertise. This strength turns into a weakness if some of the input scenarios are overlooked in the algorithm design process. With the rapidly increasing volumes and varieties of data and the advent of cheaper and faster computational resources end-to-end deep neural networks have become an appealing alternative to the traditional computer vision pipelines. We demonstrate this in a series of research articles, each of which considers a particular task of either image analysis or synthesis and presenting a solution based on a ``deep'' backbone. In the first article, we deal with a classic low-level vision problem of edge detection. Inspired by a top-performing non-neural approach, we take a step towards building an end-to-end system by combining feature extraction and description in a single convolutional network. The resulting fully data-driven method matches or surpasses the detection quality of the existing conventional approaches in the settings for which they were designed while being significantly more usable in the out-of-domain situations. In our second article, we introduce a custom architecture for image manipulation based on the idea that most of the pixels in the output image can be directly copied from the input. This technique bears several significant advantages over the naive black-box neural approach. It retains the level of detail of the original images, does not introduce artifacts due to insufficient capacity of the underlying neural network and simplifies training process, to name a few. We demonstrate the efficiency of the proposed architecture on the challenging gaze correction task where our system achieves excellent results. In the third article, we slightly diverge from pure computer vision and study a more general problem of domain adaption. There, we introduce a novel training-time algorithm (\ie, adaptation is attained by using an auxilliary objective in addition to the main one). We seek to extract features that maximally confuse a dedicated network called domain classifier while being useful for the task at hand. The domain classifier is learned simultaneosly with the features and attempts to tell whether those features are coming from the source or the target domain. The proposed technique is easy to implement, yet results in superior performance in all the standard benchmarks. Finally, the fourth article presents a new kind of generative model for image data. Unlike conventional neural network based approaches our system dubbed SPIRAL describes images in terms of concise low-level programs executed by off-the-shelf rendering software used by humans to create visual content. Among other things, this allows SPIRAL not to waste its capacity on minutae of datasets and focus more on the global structure. The latent space of our model is easily interpretable by design and provides means for predictable image manipulation. We test our approach on several popular datasets and demonstrate its power and flexibility

Dépôt Institutionnel Numérique

Automated damage diagnosis of concrete jack arch beam using optimized deep stacked autoencoders and multi-sensor fusion

Author: Ding Zhenghao
Li Jianchun
Li Jiantao
Samali Bijan (R17646)
Xia Yong
Yu Yang
Publication venue: U.K., Elsevier
Publication date: 01/01/2023
Field of study

A novel hybrid framework of optimized deep learning models combined with multi-sensor fusion is developed for condition diagnosis of concrete arch beam. The vibration responses of structure are first processed by principal component analysis for dimensionality reduction and noise elimination. Then, the deep network based on stacked autoencoders (SAE) is established at each sensor for initial condition diagnosis, where extracted principal components and corresponding condition categories are inputs and output, respectively. To enhance diagnostic accuracy of proposed deep SAE, an enhanced whale optimization algorithm is proposed to optimize network meta-parameters. Eventually, Dempster-Shafer fusion algorithm is employed to combine initial diagnosis results from each sensor to make a final diagnosis. A miniature structural component of Sydney Harbour Bridge with artificial multiple progressive damages is tested in laboratory. The results demonstrate that the proposed method can detect structural damage accurately, even under the condition of limited sensors and high levels of uncertainties

PolyU Institutional Repository

Western Sydney ResearchDirect

Lung_PAYNet: a pyramidal attention based deep learning network for lung nodule segmentation

Author: Bandopadhyay Shivargha
Bruntha P. Malin
Dang Helen
Pandian S. Immanuel Alex
Pomplun Marc
Sagayam K. Martin
Publication venue: DigitalCommons@Molloy
Publication date: 01/01/2022
Field of study

Accurate and reliable lung nodule segmentation in computed tomography (CT) images is required for early diagnosis of lung cancer. Some of the difficulties in detecting lung nodules include the various types and shapes of lung nodules, lung nodules near other lung structures, and similar visual aspects. This study proposes a new model named Lung_PAYNet, a pyramidal attention-based architecture, for improved lung nodule segmentation in low-dose CT images. In this architecture, the encoder and decoder are designed using an inverted residual block and swish activation function. It also employs a feature pyramid attention network between the encoder and decoder to extract exact dense features for pixel classification. The proposed architecture was compared to the existing UNet architecture, and the proposed methodology yielded significant results. The proposed model was comprehensively trained and validated using the LIDC-IDRI dataset available in the public domain. The experimental results revealed that the Lung_PAYNet delivered remarkable segmentation with a Dice similarity coefficient of 95.7%, mIOU of 91.75%, sensitivity of 92.57%, and precision of 96.75%

Molloy College: Digital Commons @ Molloy

PubMed Central