39 research outputs found
Recommended from our members
Mapping individual trees from airborne multi-sensor imagery
Airborne multi-sensor imaging is increasingly used to examine vegetation properties. The
advantage of using multiple types of sensor is that each detects a different feature of the
vegetation, so that collectively they provide a detailed understanding of the ecological
pattern. Specifically, Light Detection And Ranging (LiDAR) devices produce detailed point
clouds of where laser pulses have been backscattered from surfaces, giving information on
vegetation structure; hyperspectral sensors measure reflectances within narrow wavebands,
providing spectrally detailed information about the optical properties of targets; while aerial
photographs provide high spatial-resolution imagery so that they can provide more feature
details which cannot be identified from hyperspectral or LiDAR intensity images. Using a
combination of these sensors, effective techniques can be developed for mapping species and
inferring leaf physiological processes at ITC-level.
Although multi-sensor approaches have revolutionised ecological research, their application
in mapping individual tree crowns is limited by two major technical issues: (a)
Multi-sensor imaging requires all images taken from different sensors to be co-aligned, but
different sensor characteristics result in scale, rotation or translation mismatches between
the images, making correction a pre-requisite of individual tree crown mapping; (b) reconstructing
individual tree crowns from unstructured raw data space requires an accurate
tree delineation algorithm. This thesis develops a schematic way to resolve these technical
issues using the-state-of-the-art computer vision algorithms. A variational method, called
NGF-Curv, was developed to co-align hyperspectral imagery, LiDAR and aerial photographs.NGF-Curv algorithm can deal with very complex topographic and lens distortions efficiently,
thus improving the accuracy of co-alignment compared to established image registration
methods for airborne data. A graph cut method, named MCNCP-RNC was developed to
reconstruct individual tree crowns from fully integrated multi-sensor imagery. MCNCP-RNC
is not influenced by interpolation artefacts because it detects trees in 3D, and it detects
individual tree crowns using both hyperspectral imagery and LiDAR.
Based on these algorithms, we developed a new workflow to detect species at pixel and
ITC levels in a temperate deciduous forest in the UK. In addition, we modified the workflow
to monitor physiological responses of two oak species with respect to environmental gradients
in a Mediterranean woodland in Spain. The results show that our scheme can detect individual
tree crowns, find species and monitor physiological responses of canopy leaves
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations
Recent text-to-speech models have reached the level of generating natural
speech similar to what humans say. But there still have limitations in terms of
expressiveness. The existing emotional speech synthesis models have shown
controllability using interpolated features with scaling parameters in
emotional latent space. However, the emotional latent space generated from the
existing models is difficult to control the continuous emotional intensity
because of the entanglement of features like emotions, speakers, etc. In this
paper, we propose a novel method to control the continuous intensity of
emotions using semi-supervised learning. The model learns emotions of
intermediate intensity using pseudo-labels generated from phoneme-level
sequences of speech information. An embedding space built from the proposed
model satisfies the uniform grid geometry with an emotional basis. The
experimental results showed that the proposed method was superior in
controllability and naturalness.Comment: Accepted by Interspeech 202
Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference
In this paper, we present Mondrian, an edge system that enables
high-performance object detection on high-resolution video streams. Many
lightweight models and system optimization techniques have been proposed for
resource-constrained devices, but they do not fully utilize the potential of
the accelerators over dynamic, high-resolution videos. To enable such
capability, we devise a novel Compressive Packed Inference to minimize
per-pixel processing costs by selectively determining the necessary pixels to
process and combining them to maximize processing parallelism. In particular,
our system quickly extracts ROIs and dynamically shrinks them, reflecting the
effect of the fast-changing characteristics of objects and scenes. It then
intelligently combines such scaled ROIs into large canvases to maximize the
utilization of inference accelerators such as GPU. Evaluation across various
datasets, models, and devices shows Mondrian outperforms state-of-the-art
baselines (e.g., input rescaling, ROI extractions, ROI extractions+batching) by
15.0-19.7% higher accuracy, leading to 6.65 higher throughput than
frame-wise inference for processing various 1080p video streams. We will
release the code after the paper review
Three-dimensional Segmentation of Trees Through a Flexible Multi-Class Graph Cut Algorithm (MCGC)
Developing a robust algorithm for automatic individual tree crown (ITC) detection from airborne laser scanning datasets is important for tracking the responses of trees to anthropogenic change. Such approaches allow the size, growth and mortality of individual trees to be measured, enabling forest carbon stocks and dynamics to be tracked and understood. Many algorithms exist for structurally simple forests including coniferous forests and plantations. Finding a robust solution for structurally complex, species-rich tropical forests remains a challenge; existing segmentation algorithms often perform less well than simple area-based approaches when estimating plot-level biomass. Here we describe a Multi-Class Graph Cut (MCGC) approach to tree crown delineation. This uses local three-dimensional geometry and density information, alongside knowledge of crown allometries, to segment individual tree crowns from airborne LiDAR point clouds. Our approach robustly identifies trees in the top and intermediate layers of the canopy, but cannot recognise small trees. From these three-dimensional crowns, we are able to measure individual tree biomass. Comparing these estimates to those from permanent inventory plots, our algorithm is able to produce robust estimates of hectare-scale carbon density, demonstrating the power of ITC approaches in monitoring forests. The flexibility of our method to add additional dimensions of information, such as spectral reflectance, make this approach an obvious avenue for future development and extension to other sources of three-dimensional data, such as structure from motion datasets.Jonathan Williams holds a NERC studentship [NE/N008952/1] which is a CASE partnership with support from Royal Society for the Protection of Birds (RSPB). David Coomes was supported by an International Academic Fellowship from the Leverhulme Trust. Carola-Bibiane Schoenlieb was supported by the RISE projects CHiPS and NoMADS, the Cantab Capital Institute for the Mathematics of Information and the Alan Turing Institute. We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Quadro P6000 GPU used for this research
LIVE - Life-Immersive Virtual Environment with Physical Interaction-aware Adaptive Blending
© 2022 Owner/Author.We present LIVE, a system enabling a life-immersive Mixed Reality experience. Daily MR usage is challenging in that the user's interaction state with the physical objects continuously change over time, while the immersion and the utility should be supported simultaneously in the process. As many works of blending the virtual and physical world are designed for a single interaction state, they are not enough to support life-immersive MR. We propose the initial design of LIVE that (i) selects the current user's context among the three states of interaction with physical object and (ii) applies the most suitable blending method to balance immersion and the utility.N