136 research outputs found
Robotic Burst Imaging for Light-Constrained 3D Reconstruction
This thesis proposes a novel input scheme, robotic burst, to improve vision-based 3D reconstruction for robots operating in low-light conditions, where existing state-of-the-art robotic vision algorithms struggle due to low signal-to-noise ratio in low-light images. We aim to improve the correspondence search stage of feature-based reconstruction using robotic burst imaging, including burst-merged images, a burst feature finder, and an end-to-end learning-based feature extractor. Firstly, we establish the use of robotic burst imaging to compute burst-merged images for feature-based reconstruction. We then develop a burst feature finder that locates features with well-defined scale and apparent motion on a burst to deal with limitations of burst-merged images such as misalignment at strong noise. To improve feature matches in burst-based reconstruction, we also present an end-to-end learning-based feature extractor that finds well-defined scale features directly on light-constrained bursts.
We evaluate our methods against state-of-the-art reconstruction methods for conventional imaging that uses both classical and learning-based feature extractors. We validate our novel input scheme using burst imagery captured on a robotic arm and drones. We demonstrate progressive improvements in low-light reconstruction using our burst-based methods against conventional approaches and overall, converging 90% of all scenes captured in millilux conditions that otherwise converge with 10% success rate using conventional methods. This work opens up new avenues for applications, including autonomous driving and drone delivery at night, mining, and behavioral studies on nocturnal animals
Autonomisten metsäkoneiden koneaistijärjestelmät
A prerequisite for increasing the autonomy of forest machinery is to provide robots with digital situational awareness, including a representation of the surrounding environment and the robot's own state in it. Therefore, this article-based dissertation proposes perception systems for autonomous or semi-autonomous forest machinery as a summary of seven publications. The work consists of several perception methods using machine vision, lidar, inertial sensors, and positioning sensors. The sensors are used together by means of probabilistic sensor fusion. Semi-autonomy is interpreted as a useful intermediary step, situated between current mechanized solutions and full autonomy, to assist the operator.
In this work, the perception of the robot's self is achieved through estimation of its orientation and position in the world, the posture of its crane, and the pose of the attached tool. The view around the forest machine is produced with a rotating lidar, which provides approximately equal-density 3D measurements in all directions. Furthermore, a machine vision camera is used for detecting young trees among other vegetation, and sensor fusion of an actuated lidar and machine vision camera is utilized for detection and classification of tree species. In addition, in an operator-controlled semi-autonomous system, the operator requires a functional view of the data around the robot. To achieve this, the thesis proposes the use of an augmented reality interface, which requires measuring the pose of the operator's head-mounted display in the forest machine cabin. Here, this work adopts a sensor fusion solution for a head-mounted camera and inertial sensors.
In order to increase the level of automation and productivity of forest machines, the work focuses on scientifically novel solutions that are also adaptable for industrial use in forest machinery. Therefore, all the proposed perception methods seek to address a real existing problem within current forest machinery. All the proposed solutions are implemented in a prototype forest machine and field tested in a forest. The proposed methods include posture measurement of a forestry crane, positioning of a freely hanging forestry crane attachment, attitude estimation of an all-terrain vehicle, positioning a head mounted camera in a forest machine cabin, detection of young trees for point cleaning, classification of tree species, and measurement of surrounding tree stems and the ground surface underneath.Metsäkoneiden autonomia-asteen kasvattaminen edellyttää, että robotilla on digitaalinen tilannetieto sekä ympäristöstä että robotin omasta toiminnasta. Tämän saavuttamiseksi työssä on kehitetty autonomisen tai puoliautonomisen metsäkoneen koneaistijärjestelmiä, jotka hyödyntävät konenäkö-, laserkeilaus- ja inertia-antureita sekä paikannusantureita. Työ liittää yhteen seitsemässä artikkelissa toteutetut havainnointimenetelmät, joissa useiden anturien mittauksia yhdistetään sensorifuusiomenetelmillä. Työssä puoliautonomialla tarkoitetaan hyödyllisiä kuljettajaa avustavia välivaiheita nykyisten mekanisoitujen ratkaisujen ja täyden autonomian välillä.
Työssä esitettävissä autonomisen metsäkoneen koneaistijärjestelmissä koneen omaa toimintaa havainnoidaan estimoimalla koneen asentoa ja sijaintia, nosturin asentoa sekä siihen liitetyn työkalun asentoa suhteessa ympäristöön. Yleisnäkymä metsäkoneen ympärille toteutetaan pyörivällä laserkeilaimella, joka tuottaa lähes vakiotiheyksisiä 3D-mittauksia jokasuuntaisesti koneen ympäristöstä. Nuoret puut tunnistetaan muun kasvillisuuden joukosta käyttäen konenäkökameraa. Lisäksi puiden tunnistamisessa ja puulajien luokittelussa käytetään konenäkökameraa ja laserkeilainta yhdessä sensorifuusioratkaisun avulla. Lisäksi kuljettajan ohjaamassa puoliautonomisessa järjestelmässä kuljettaja tarvitsee toimivan tavan ymmärtää koneen tuottaman mallin ympäristöstä. Työssä tämä ehdotetaan toteutettavaksi lisätyn todellisuuden käyttöliittymän avulla, joka edellyttää metsäkoneen ohjaamossa istuvan kuljettajan lisätyn todellisuuden lasien paikan ja asennon mittaamista. Työssä se toteutetaan kypärään asennetun kameran ja inertia-anturien sensorifuusiona.
Jotta metsäkoneiden automatisaatiotasoa ja tuottavuutta voidaan lisätä, työssä keskitytään uusiin tieteellisiin ratkaisuihin, jotka soveltuvat teolliseen käyttöön metsäkoneissa. Kaikki esitetyt koneaistijärjestelmät pyrkivät vastaamaan todelliseen olemassa olevaan tarpeeseen nykyisten metsäkoneiden käytössä. Siksi kaikki menetelmät on implementoitu prototyyppimetsäkoneisiin ja tulokset on testattu metsäympäristössä. Työssä esitetyt menetelmät mahdollistavat metsäkoneen nosturin, vapaasti riippuvan työkalun ja ajoneuvon asennon estimoinnin, lisätyn todellisuuden lasien asennon mittaamisen metsäkoneen ohjaamossa, nuorten puiden havaitsemisen reikäperkauksessa, ympäröivien puiden puulajien tunnistuksen, sekä puun runkojen ja maanpinnan mittauksen
Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors
As the physical size of recent CMOS image sensors (CIS) gets smaller, the
latest mobile cameras are adopting unique non-Bayer color filter array (CFA)
patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with
adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA
thanks to their changeable pixel-bin sizes for different light conditions but
may introduce visual artifacts during demosaicing due to their inherent pixel
pattern structures and sensor hardware characteristics. Previous demosaicing
methods have primarily focused on Bayer CFA, necessitating distinct
reconstruction methods for non-Bayer patterned CIS with various CFA modes under
different lighting conditions. In this work, we propose an efficient unified
demosaicing method that can be applied to both conventional Bayer RAW and
various non-Bayer CFAs' RAW data in different operation modes. Our Knowledge
Learning-based demosaicing model for Adaptive Patterns, namely KLAP, utilizes
CFA-adaptive filters for only 1% key filters in the network for each CFA, but
still manages to effectively demosaic all the CFAs, yielding comparable
performance to the large-scale models. Furthermore, by employing meta-learning
during inference (KLAP-M), our model is able to eliminate unknown
sensor-generic artifacts in real RAW data, effectively bridging the gap between
synthetic images and real sensor RAW. Our KLAP and KLAP-M methods achieved
state-of-the-art demosaicing performance in both synthetic and real RAW data of
Bayer and non-Bayer CFAs
Learning-based Wavelet-like Transforms For Fully Scalable and Accessible Image Compression
The goal of this thesis is to improve the existing wavelet transform with the aid of machine learning techniques, so as to enhance coding efficiency of wavelet-based image compression frameworks, such as JPEG 2000.
In this thesis, we first propose to augment the conventional base wavelet transform with two additional learned lifting steps -- a high-to-low step followed by a low-to-high step. The high-to-low step suppresses aliasing in the low-pass band by using the detail bands at the same resolution, while the low-to-high step aims to further remove redundancy from detail bands by using the corresponding low-pass band. These two additional steps reduce redundancy (notably aliasing information) amongst the wavelet subbands, and also improve the visual quality of reconstructed images at reduced resolutions.
To train these two networks in an end-to-end fashion, we develop a backward annealing approach to overcome the non-differentiability of the quantization and cost functions during back-propagation. Importantly, the two additional networks share a common architecture, named a proposal-opacity topology, which is inspired and guided by a specific theoretical argument related to geometric flow. This particular network topology is compact and with limited non-linearities, allowing a fully scalable system; one pair of trained network parameters are applied for all levels of decomposition and for all bit-rates of interest. By employing the additional lifting networks within the JPEG2000 image coding standard, we can achieve up to 17.4% average BD bit-rate saving over a wide range of bit-rates, while retaining the quality and resolution scalability features of JPEG2000.
Built upon the success of the high-to-low and low-to-high steps, we then study more broadly the extension of neural networks to all lifting steps that correspond to the base wavelet transform. The purpose of this comprehensive study is to understand what is the most effective way to develop learned wavelet-like transforms for highly scalable and accessible image compression. Specifically, we examine the impact of the number of learned lifting steps, the number of layers and the number of channels in each learned lifting network, and kernel support in each layer. To facilitate the study, we develop a generic training methodology that is simultaneously appropriate to all lifting structures considered. Experimental results ultimately suggest that to improve the existing wavelet transform, it is more profitable to augment a larger wavelet transform with more diverse high-to-low and low-to-high steps, rather than developing deep fully learned lifting structures
Engineering a Low-Cost Remote Sensing Capability for Deep-Space Applications
Systems engineering (SE) has been a useful tool for providing objective processes to breaking down complex technical problems to simpler tasks, while concurrently generating metrics to provide assurance that the solution is fit-for-purpose. Tailored forms of SE have also been used by cubesat mission designers to assist in reducing risk by providing iterative feedback and key artifacts to provide managers with the evidence to adjust resources and tasking for success. Cubesat-sized spacecraft are being planned, built and in some cases, flown to provide a lower-cost entry point for deep-space exploration. This is particularly important for agencies and countries with lower space exploration budgets, where specific mission objectives can be used to develop tailored payloads within tighter constraints, while also returning useful scientific results or engineering data.
In this work, a tailored SE tradespace approach was used to help determine how a 6 unit (6U) cubesat could be built from commercial-off-the-shelf (COTS)-based components and undertake remote sensing missions near Mars or near-Earth Asteroids. The primary purpose of these missions is to carry a hyperspectral sensor sensitive to 600-800nm wavelengths (hereafter defined as “red-edge”), that will investigate mineralogy characteristics commonly associated with oxidizing and hydrating environments in red-edge. Minerals of this type remain of high interest for indicators of present or past habitability for life, or active geologic processes. Implications of operating in a deep-space environment were considered as part of engineering constraints of the design, including potential reduction of available solar energy, changes in thermal environment and background radiation, and vastly increased communications distances.
The engineering tradespace analysis identified realistic COTS options that could satisfy mission objectives for the 6U cubesat bus while also accommodating a reasonable degree of risk. The exception was the communication subsystem, in which case suitable capability was restricted to one particular option. This analysis was used to support an additional trade investigation into the type of sensors that would be most suitable for building the red-edge hyperspectral payload. This was in part constrained by ensuring not only that readily available COTS sensors were used, but that affordability, particularly during a geopolitical environment that was affecting component supply surety and access to manufacturing facilities, was optimized. It was found that a number of sensor options were available for designing a useful instrument, although the rapid development and life-of-type issues with COTS sensors restricted the ability to obtain useful metrics on their performance in the space environment.
Additional engineering testing was conducted by constructing hyperspectral sensors using sensors popular in science, technology, engineering and mathematics (STEM) contexts. Engineering and performance metrics of the payload containing the sensors was conducted; and performance of these sensors in relevant analogous environments. A selection of materials exhibiting spectral phenomenology in the red-edge portion of the spectrum was used to produce metrics on the performance of the sensors. It was found that low-cost cameras were able to distinguish between most minerals, although they required a wider spectral range to do so. Additionally, while Raspberry Pi cameras have been popular with scientific applications, a low-cost camera without a Bayer filter markedly improved spectral sensitivity. Consideration for space-environment testing was also trialed in additional experiments using high-altitude balloons to reach the near-space environment. The sensor payloads experienced conditions approximating the surface of Mars, and results were compared with Landsat 7, a heritage Earth sensing satellite, using a popular vegetation index. The selected Raspberry Pi cameras were able to provide useful results from near-space that could be compared with space imagery.
Further testing incorporated comparative analysis of custom-built sensors using readily available Raspberry Pi and astronomy cameras, and results from Mastcam and Mastcam/z instruments currently on the surface of Mars. Two sensor designs were trialed in field settings possessing Mars-analogue materials, and a subset of these materials were analysed using a laboratory-grade spectro-radiometer. Results showed the Raspberry Pi multispectral camera would be best suited for broad-scale indications of mineralogy that could be targeted by the pushbroom sensor. This sensor was found to possess a narrower spectral range than the Mastcam and Mastcam/z but was sensitive to a greater number of bands within this range. The pushbroom sensor returned data on spectral phenomenology associated with attributes of Minerals of the type found on Mars. The actual performance of the payload in appropriate conditions was important to provide critical information used to risk reduce future designs. Additionally, the successful outcomes of the trials reduced risk for their application in a deep space environment.
The SE and practical performance testing conducted in this thesis could be developed further to design, build and fly a hyperspectral sensor, sensitive to red-edge wavelengths, on a deep-space cubesat mission. Such a mission could be flown at reasonable cost yet return useful scientific and engineering data
Fast and Interpretable Nonlocal Neural Networks for Image Denoising via Group-Sparse Convolutional Dictionary Learning
Nonlocal self-similarity within natural images has become an increasingly
popular prior in deep-learning models. Despite their successful image
restoration performance, such models remain largely uninterpretable due to
their black-box construction. Our previous studies have shown that
interpretable construction of a fully convolutional denoiser (CDLNet), with
performance on par with state-of-the-art black-box counterparts, is achievable
by unrolling a dictionary learning algorithm. In this manuscript, we seek an
interpretable construction of a convolutional network with a nonlocal
self-similarity prior that performs on par with black-box nonlocal models. We
show that such an architecture can be effectively achieved by upgrading the
sparsity prior of CDLNet to a weighted group-sparsity prior. From this
formulation, we propose a novel sliding-window nonlocal operation, enabled by
sparse array arithmetic. In addition to competitive performance with black-box
nonlocal DNNs, we demonstrate the proposed sliding-window sparse attention
enables inference speeds greater than an order of magnitude faster than its
competitors.Comment: 11 pages, 8 figures, 6 table
Visually Adversarial Attacks and Defenses in the Physical World: A Survey
Although Deep Neural Networks (DNNs) have been widely applied in various
real-world scenarios, they are vulnerable to adversarial examples. The current
adversarial attacks in computer vision can be divided into digital attacks and
physical attacks according to their different attack forms. Compared with
digital attacks, which generate perturbations in the digital pixels, physical
attacks are more practical in the real world. Owing to the serious security
problem caused by physically adversarial examples, many works have been
proposed to evaluate the physically adversarial robustness of DNNs in the past
years. In this paper, we summarize a survey versus the current physically
adversarial attacks and physically adversarial defenses in computer vision. To
establish a taxonomy, we organize the current physical attacks from attack
tasks, attack forms, and attack methods, respectively. Thus, readers can have a
systematic knowledge of this topic from different aspects. For the physical
defenses, we establish the taxonomy from pre-processing, in-processing, and
post-processing for the DNN models to achieve full coverage of the adversarial
defenses. Based on the above survey, we finally discuss the challenges of this
research field and further outlook on the future direction
Deep Structured Layers for Instance-Level Optimization in 2D and 3D Vision
The approach we present in this thesis is that of integrating optimization problems
as layers in deep neural networks. Optimization-based modeling provides an additional set of tools enabling the design of powerful neural networks for a wide
battery of computer vision tasks. This thesis shows formulations and experiments
for vision tasks ranging from image reconstruction to 3D reconstruction.
We first propose an unrolled optimization method with implicit regularization
properties for reconstructing images from noisy camera readings. The method resembles an unrolled majorization minimization framework with convolutional neural networks acting as regularizers. We report state-of-the-art performance in image
reconstruction on both noisy and noise-free evaluation setups across many datasets.
We further focus on the task of monocular 3D reconstruction of articulated objects using video self-supervision. The proposed method uses a structured layer for
accurate object deformation that controls a 3D surface by displacing a small number
of learnable handles. While relying on a small set of training data per category for
self-supervision, the method obtains state-of-the-art reconstruction accuracy with
diverse shapes and viewpoints for multiple articulated objects.
We finally address the shortcomings of the previous method that revolve
around regressing the camera pose using multiple hypotheses. We propose a method
that recovers a 3D shape from a 2D image by relying solely on 3D-2D correspondences regressed from a convolutional neural network. These correspondences are
used in conjunction with an optimization problem to estimate per sample the camera pose and deformation. We quantitatively show the effectiveness of the proposed
method on self-supervised 3D reconstruction on multiple categories without the need for multiple hypotheses
Instance Segmentation in the Dark
Existing instance segmentation techniques are primarily tailored for
high-visibility inputs, but their performance significantly deteriorates in
extremely low-light environments. In this work, we take a deep look at instance
segmentation in the dark and introduce several techniques that substantially
boost the low-light inference accuracy. The proposed method is motivated by the
observation that noise in low-light images introduces high-frequency
disturbances to the feature maps of neural networks, thereby significantly
degrading performance. To suppress this ``feature noise", we propose a novel
learning method that relies on an adaptive weighted downsampling layer, a
smooth-oriented convolutional block, and disturbance suppression learning.
These components effectively reduce feature noise during downsampling and
convolution operations, enabling the model to learn disturbance-invariant
features. Furthermore, we discover that high-bit-depth RAW images can better
preserve richer scene information in low-light conditions compared to typical
camera sRGB outputs, thus supporting the use of RAW-input algorithms. Our
analysis indicates that high bit-depth can be critical for low-light instance
segmentation. To mitigate the scarcity of annotated RAW datasets, we leverage a
low-light RAW synthetic pipeline to generate realistic low-light data. In
addition, to facilitate further research in this direction, we capture a
real-world low-light instance segmentation dataset comprising over two thousand
paired low/normal-light images with instance-level pixel-wise annotations.
Remarkably, without any image preprocessing, we achieve satisfactory
performance on instance segmentation in very low light (4~\% AP higher than
state-of-the-art competitors), meanwhile opening new opportunities for future
research.Comment: Accepted by International Journal of Computer Vision (IJCV) 202
- …