140 research outputs found
Recommended from our members
3D Shape Understanding and Generation
In recent years, Machine Learning techniques have revolutionized solutions to longstanding image-based problems, like image classification, generation, semantic segmentation, object detection and many others. However, if we want to be able to build agents that can successfully interact with the real world, those techniques need to be capable of reasoning about the world as it truly is: a tridimensional space. There are two main challenges while handling 3D information in machine learning models. First, it is not clear what is the best 3D representation. For images, convolutional neural networks (CNNs) operating on raster images yield the best results in virtually all image-based benchmarks. For 3D data, the best combination of model and representation is still an open question. Second, 3D data is not available on the same scale as images – taking pictures is a common procedure in our daily lives, whereas capturing 3D content is an activity usually restricted to specialized professionals. This thesis is focused on addressing both of these issues. Which model and representation should we use for generating and recognizing 3D data? What are efficient ways of learning 3D representations from a few examples? Is it possible to leverage image data to build models capable of reasoning about the world in 3D?
Our research findings show that it is possible to build models that efficiently generate 3D shapes as irregularly structured representations. Those models require significantly less memory while generating higher quality shapes than the ones based on voxels and multi-view representations. We start by developing techniques to generate shapes represented as point clouds. This class of models leads to high quality reconstructions and better unsupervised feature learning. However, since point clouds are not amenable to editing and human manipulation, we also present models capable of generating shapes as sets of shape handles -- simpler primitives that summarize complex 3D shapes and were specifically designed for high-level tasks and user interaction. Despite their effectiveness, those approaches require some form of 3D supervision, which is scarce. We present multiple alternatives to this problem. First, we investigate how approximate convex decomposition techniques can be used as self-supervision to improve recognition models when only a limited number of labels are available. Second, we study how neural network architectures induce shape priors that can be used in multiple reconstruction tasks -- using both volumetric and manifold representations. In this regime, reconstruction is performed from a single example -- either a sparse point cloud or multiple silhouettes. Finally, we demonstrate how to train generative models of 3D shapes without using any 3D supervision by combining differentiable rendering techniques and Generative Adversarial Networks
Tracking and Mapping in Medical Computer Vision: A Review
As computer vision algorithms are becoming more capable, their applications
in clinical systems will become more pervasive. These applications include
diagnostics such as colonoscopy and bronchoscopy, guiding biopsies and
minimally invasive interventions and surgery, automating instrument motion and
providing image guidance using pre-operative scans. Many of these applications
depend on the specific visual nature of medical scenes and require designing
and applying algorithms to perform in this environment.
In this review, we provide an update to the field of camera-based tracking
and scene mapping in surgery and diagnostics in medical computer vision. We
begin with describing our review process, which results in a final list of 515
papers that we cover. We then give a high-level summary of the state of the art
and provide relevant background for those who need tracking and mapping for
their clinical applications. We then review datasets provided in the field and
the clinical needs therein. Then, we delve in depth into the algorithmic side,
and summarize recent developments, which should be especially useful for
algorithm designers and to those looking to understand the capability of
off-the-shelf methods. We focus on algorithms for deformable environments while
also reviewing the essential building blocks in rigid tracking and mapping
since there is a large amount of crossover in methods. Finally, we discuss the
current state of the tracking and mapping methods along with needs for future
algorithms, needs for quantification, and the viability of clinical
applications in the field. We conclude that new methods need to be designed or
combined to support clinical applications in deformable environments, and more
focus needs to be put into collecting datasets for training and evaluation.Comment: 31 pages, 17 figure
Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework
The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications
3D Reconstruction of 'In-the-Wild' Faces in Images and Videos
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this record 3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and are among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions ('in-the-wild'). In this paper, we propose the first 'in-the-wild' 3DMM by combining a statistical model of facial identity and expression shape with an 'in-the-wild' texture model. We show that such an approach allows for the development of a greatly simplified fitting procedure for images and videos, as there is no need to optimise with regards to the illumination parameters. We have collected three new benchmarks that combine 'in-the-wild' images and video with ground truth 3D facial geometry, the first of their kind, and report extensive quantitative evaluations using them that demonstrate our method is state-of-the-art.Engineering and Physical Sciences Research Council (EPSRC
A Work Flow and Evaluation of Using Unmanned Aerial Systems for Deriving Forest Stand Characteristics in Mixed Hardwoods of West Virginia
Forest inventory information is a principle driver for forest management decisions. Information gathered through these inventories provides a summary of the condition of forested stands. The method by which remote sensing aids land managers is changing rapidly. Imagery produced from unmanned aerial systems (UAS) offer high temporal and spatial resolutions and have added another approach to small-scale forest management. UAS imagery is less expensive and easier to coordinate to meet project needs compared to traditional manned aerial imagery. This study focused on producing an efficient and approachable work flow for producing forest stand board foot volume estimates from UAS imagery in mixed hardwood stands of West Virginia. A supplementary aim of this project was to evaluate which season was best to collect imagery for forest inventory. True color imagery was collected with a DJI Phantom 3 Professional UAS and was processed in Agisoft Photoscan Professional. Automated segmentation was performed with Trimble eCognition Developer\u27s multi-resolution segmentation function with manual optimization of parameters through an iterative process. Individual tree volume metrics were derived from field data relationships and volume estimates were processed in EZ CRUZ forest inventory software. The software, at best, correctly segmented 43% of the individual tree crowns. No correlation between season of imagery acquisition and quality of segmentation was shown. Volume and other stand characteristics were not accurately estimated and were faulted by poor segmentation. However, the imagery was able to capture gaps consistently and the high resolution imagery was able to provide a visualization of forest health. Difficulties, successes and time required for these procedures were thoroughly noted
Photogrammetric techniques for across-scale soil erosion assessment: Developing methods to integrate multi-temporal high resolution topography data at field plots
Soil erosion is a complex geomorphological process with varying influences of different impacts at different spatio-temporal scales. To date, measurement of soil erosion is predominantly realisable at specific scales, thereby detecting separate processes, e.g. interrill erosion contrary to rill erosion. It is difficult to survey soil surface changes at larger areal coverage such as field scale with high spatial resolution. Either net changes at the system outlet or remaining traces after the erosional event are usually measured. Thus, either quasi-point measurements are extrapolated to the corresponding area without knowing the actual sediment source as well as sediment storage behaviour on the plot or erosion rates are estimated disrupting the area of investigation during the data acquisition impeding multi-temporal assessment. Furthermore, established methods of soil erosion detection and quantification are typically only reliable for large event magnitudes, very labour and time intense, or inflexible.
To better observe soil erosion processes at field scale and under natural conditions, the development of a method is necessary, which identifies and quantifies sediment sources and sinks at the hillslope with high spatial resolution and captures single precipitation events as well as allows for longer observation periods. Therefore, an approach is introduced, which measures soil surface changes for multi-spatio-temporal scales without disturbing the area of interest. Recent advances regarding techniques to capture high resolution topography (HiRT) data led to several promising tools for soil erosion measurement with corresponding advantages but also disadvantages. The necessity exists to evaluate those methods because they have been rarely utilised in soil surface studies.
On the one hand, there is terrestrial laser scanning (TLS), which comprises high error reliability and retrieves 3D information directly. And on the other hand, there is unmanned aerial vehicle (UAV) technology in combination with structure from motion (SfM) algorithms resulting in UAV photogrammetry, which is very flexible in the field and depicts a beneficial perspective. Evaluation of the TLS feasibility reveals that this method implies a systematic error that is distance-related and temporal constant for the investigated device and can be corrected transferring calibration values retrieved from an estimated lookup table. However, TLS still reaches its application limits quickly due to an unfavourable (almost horizontal) scanning view at the soil surface resulting in a fast decrease of point density and increase of noise with increasing distance from the device. UAV photogrammetry allows for a better perspective (birds-eye view) onto the area of interest, but possesses more complex error behaviour, especially in regard to the systematic error of a DEM dome, which depends on the method for 3D reconstruction from 2D images (i.e. options for additional implementation of observations) and on the image network configuration (i.e. parallel-axes and control point configuration). Therefore, a procedure is developed that enables flexible usage of different cameras and software tools without the need of additional information or specific camera orientations and yet avoiding this dome error. Furthermore, the accuracy potential of UAV photogrammetry describing rough soil surfaces is assessed because so far corresponding data is missing.
Both HiRT methods are used for multi-temporal measurement of soil erosion processes resulting in surface changes of low magnitudes, i.e. rill and especially interrill erosion. Thus, a reference with high accuracy and stability is a requirement. A local reference system with sub-cm and at its best 1 mm accuracy is setup and confirmed by control surveys. TLS and UAV photogrammetry data registration with these targets ensures that errors due to referencing are of minimal impact. Analysis of the multi-temporal performance of both HiRT methods affirms TLS to be suitable for the detection of erosion forms of larger magnitudes because of a level of detection (LoD) of 1.5 cm. UAV photogrammetry enables the quantification of even lower magnitude changes (LoD of 1 cm) and a reliable observation of the change of surface roughness, which is important for runoff processes, at field plots due to high spatial resolution (1 cm²). Synergetic data fusion as a subsequent post-processing step is necessary to exploit the advantages of both HiRT methods and potentially further increase the LoD.
The unprecedented high level of information entails the need for automatic geomorphic feature extraction due to the large amount of novel content. Therefore, a method is developed, which allows for accurate rill extraction and rill parameter calculation with high resolution enabling new perspectives onto rill erosion that has not been possible before due to labour and area access limits. Erosion volume and cross sections are calculated for each rill revealing a dominant rill deepening. Furthermore, rill shifting in dependence of the rill orientation towards the dominant wind direction is revealed.
Two field plots are installed at erosion prone positions in the Mediterranean (1,000 m²) and in the European loess belt (600 m²) to ensure the detection of surface changes, permitting the evaluation of the feasibility, potential and limits of TLS and UAV photogrammetry in soil erosion studies. Observations are made regarding sediment connectivity at the hillslope scale. Both HiRT methods enable the identification of local sediment sources and sinks, but still exhibiting some degree of uncertainty due to the comparable high LoD in regard to laminar accumulation and interrill erosion processes. At both field sites wheel tracks and erosion rills increase hydrological and sedimentological connectivity. However, at the Mediterranean field plot especially dis-connectivity is obvious. At the European loess belt case study a triggering event could be captured, which led to high erosion rates due to high soil moisture contents and yet further erosion increase due to rill amplification after rill incision. Estimated soil erosion rates range between 2.6 tha-1 and 121.5 tha-1 for single precipitation events and illustrate a large variability due to very different site specifications, although both case studies are located in fragile landscapes. However, the susceptibility to soil erosion has different primary causes, i.e. torrential precipitation at the Mediterranean site and high soil erodibility at the European loess belt site.
The future capability of the HiRT methods is their potential to be applicable at yet larger scales. Hence, investigations of the importance of gullys for sediment connectivity between hillslopes and channels are possible as well as the possible explanation of different erosion rates observed at hillslope and at catchment scales because local sediment sink and sources can be quantified. In addition, HiRT data can be a great tool for calibrating, validating and enhancing soil erosion models due to the unprecedented level of detail and the flexible multi-spatio-temporal application
- …