68 research outputs found
Dense Vision in Image-guided Surgery
Image-guided surgery needs an efficient and effective camera tracking system in order to perform augmented reality for overlaying preoperative models or label cancerous tissues on the 2D video images of the surgical scene. Tracking in endoscopic/laparoscopic scenes however is an extremely difficult task primarily due to tissue deformation, instrument invasion into the surgical scene and the presence of specular highlights. State of the art feature-based SLAM systems such as PTAM fail in tracking such scenes since the number of good features to track is very limited. When the scene is smoky and when there are instrument motions, it will cause feature-based tracking to fail immediately.
The work of this thesis provides a systematic approach to this problem using dense vision. We initially attempted to register a 3D preoperative model with multiple 2D endoscopic/laparoscopic images using a dense method but this approach did not perform well. We subsequently proposed stereo reconstruction to directly obtain the 3D structure of the scene. By using the dense reconstructed model together with robust estimation, we demonstrate that dense stereo tracking can be incredibly robust even within extremely challenging endoscopic/laparoscopic scenes.
Several validation experiments have been conducted in this thesis. The proposed stereo reconstruction algorithm has turned out to be the state of the art method for several publicly available ground truth datasets. Furthermore, the proposed robust dense stereo tracking algorithm has been proved highly accurate in synthetic environment (< 0.1 mm RMSE) and qualitatively extremely robust when being applied to real scenes in RALP prostatectomy surgery. This is an important step toward achieving accurate image-guided laparoscopic surgery.Open Acces
Modelling and analysis of plant image data for crop growth monitoring in horticulture
Plants can be characterised by a range of attributes, and measuring these attributes accurately and reliably is a major challenge for the horticulture industry. The measurement of those plant characteristics that are most relevant to a grower has previously been tackled almost exclusively by a combination of manual measurement and visual inspection. The purpose of this work is to propose an automated image analysis approach in order to provide an objective measure of plant attributes to remove subjective factors from assessment and to reduce labour requirements in the glasshouse. This thesis describes a stereopsis approach for estimating plant height, since height information cannot be easily determined from a single image. The stereopsis algorithm proposed in this thesis is efficient in terms of the running time, and is more accurate when compared with other algorithms.
The estimated geometry, together with colour information from the image, are then used to build a statistical plant surface model, which represents all the information from the visible spectrum. A self-organising map approach can be adopted to model plant surface attributes, but the model can be improved by using a probabilistic model such as a mixture model formulated in a Bayesian framework. Details of both methods are discussed in this thesis.
A Kalman filter is developed to track the plant model over time, extending the model to the time dimension, which enables smoothing of the noisy measurements to produce a development trend for a crop. The outcome of this work could lead to a number of potentially important applications in horticulture
Single View 3D Reconstruction using Deep Learning
One of the major challenges in the field of Computer Vision has been the reconstruction of a 3D object or scene from a single 2D image. While there are many notable examples, traditional methods for single view reconstruction often fail to generalise due to the presence of many brittle hand-crafted engineering solutions, limiting their applicability to real world problems. Recently, deep learning has taken over the field of Computer Vision and ”learning to reconstruct” has become the dominant technique for addressing the limitations of traditional methods when performing single view 3D reconstruction. Deep learning allows our reconstruction methods to learn generalisable image features and monocular cues that would otherwise be difficult to engineer through ad-hoc hand-crafted approaches. However, it can often be difficult to efficiently integrate the various 3D shape representations within the deep learning framework. In particular, 3D volumetric representations can be adapted to work with Convolutional Neural Networks, but they are computationally expensive and memory inefficient when using local convolutional layers. Also, the successful learning of generalisable feature representations for 3D reconstruction requires large amounts of diverse training data. In practice, this is challenging for 3D training data, as it entails a costly and time consuming manual data collection and annotation process. Researchers have attempted to address these issues by utilising self-supervised learning and generative modelling techniques, however these approaches often produce suboptimal results when compared with models trained on larger datasets. This thesis addresses several key challenges incurred when using deep learning for ”learning to reconstruct” 3D shapes from single view images. We observe that it is possible to learn a compressed representation for multiple categories of the 3D ShapeNet dataset, improving the computational and memory efficiency when working with 3D volumetric representations. To address the challenge of data acquisition, we leverage deep generative models to ”hallucinate” hidden or latent novel viewpoints for a given input image. Combining these images with depths estimated by a self-supervised depth estimator and the known camera properties, allowed us to reconstruct textured 3D point clouds without any ground truth 3D training data. Furthermore, we show that is is possible to improve upon the previous self-supervised monocular depth estimator by adding a self-attention and a discrete volumetric representation, significantly improving accuracy on the KITTI 2015 dataset and enabling the estimation of uncertainty depth predictions.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
Modelling and analysis of plant image data for crop growth monitoring in horticulture
Plants can be characterised by a range of attributes, and measuring these attributes accurately and reliably is a major challenge for the horticulture industry. The measurement of those plant characteristics that are most relevant to a grower has previously been tackled almost exclusively by a combination of manual measurement and visual inspection. The purpose of this work is to propose an automated image analysis approach in order to provide an objective measure of plant attributes to remove subjective factors from assessment and to reduce labour requirements in the glasshouse. This thesis describes a stereopsis approach for estimating plant height, since height information cannot be easily determined from a single image. The stereopsis algorithm proposed in this thesis is efficient in terms of the running time, and is more accurate when compared with other algorithms. The estimated geometry, together with colour information from the image, are then used to build a statistical plant surface model, which represents all the information from the visible spectrum. A self-organising map approach can be adopted to model plant surface attributes, but the model can be improved by using a probabilistic model such as a mixture model formulated in a Bayesian framework. Details of both methods are discussed in this thesis. A Kalman filter is developed to track the plant model over time, extending the model to the time dimension, which enables smoothing of the noisy measurements to produce a development trend for a crop. The outcome of this work could lead to a number of potentially important applications in horticulture.EThOS - Electronic Theses Online ServiceHorticultural Development Council (Great Britain) (HDC) (CP 37)GBUnited Kingdo
Modelling and analysis of plant image data for crop growth monitoring in horticulture
Plants can be characterised by a range of attributes, and measuring these attributes accurately and reliably is a major challenge for the horticulture industry. The measurement of those plant characteristics that are most relevant to a grower has previously been tackled almost exclusively by a combination of manual measurement and visual inspection. The purpose of this work is to propose an automated image analysis approach in order to provide an objective measure of plant attributes to remove subjective factors from assessment and to reduce labour requirements in the glasshouse. This thesis describes a stereopsis approach for estimating plant height, since height information cannot be easily determined from a single image. The stereopsis algorithm proposed in this thesis is efficient in terms of the running time, and is more accurate when compared with other algorithms. The estimated geometry, together with colour information from the image, are then used to build a statistical plant surface model, which represents all the information from the visible spectrum. A self-organising map approach can be adopted to model plant surface attributes, but the model can be improved by using a probabilistic model such as a mixture model formulated in a Bayesian framework. Details of both methods are discussed in this thesis. A Kalman filter is developed to track the plant model over time, extending the model to the time dimension, which enables smoothing of the noisy measurements to produce a development trend for a crop. The outcome of this work could lead to a number of potentially important applications in horticulture.EThOS - Electronic Theses Online ServiceHorticultural Development Council (Great Britain) (HDC) (CP 37)GBUnited Kingdo
Data-driven robotic manipulation of cloth-like deformable objects : the present, challenges and future prospects
Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of compression strength while two points on the article are pushed towards each other and include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state–action dynamics as significant obstacles to perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth shaping, knot tying/untying, dressing and bag manipulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms.Publisher PDFPeer reviewe
Intent prediction of vulnerable road users for trusted autonomous vehicles
This study investigated how future autonomous vehicles could be further trusted by vulnerable road users (such as pedestrians and cyclists) that they would be interacting with in urban traffic environments. It focused on understanding the behaviours of such road users on a deeper level by predicting their future intentions based solely on vehicle-based sensors and AI techniques. The findings showed that personal/body language attributes of vulnerable road users besides their past motion trajectories and physics attributes in the environment led to more accurate predictions about their intended actions
Computational Methods for the Alignment and Score-Informed Transcription of Piano Music
PhDThis thesis is concerned with computational methods for alignment and score-informed
transcription of piano music. Firstly, several methods are proposed to improve the alignment
robustness and accuracywhen various versions of one piece of music showcomplex
differences with respect to acoustic conditions or musical interpretation. Secondly, score
to performance alignment is applied to enable score-informed transcription.
Although music alignment methods have considerably improved in accuracy in recent
years, the task remains challenging. The research in this thesis aims to improve the
robustness for some cases where there are substantial differences between versions and
state-of-the-art methods may fail in identifying a correct alignment. This thesis first exploits
the availability of multiple versions of the piece to be aligned. By processing these
jointly, the alignment process can be stabilised by exploiting additional examples of how
a section might be interpreted or which acoustic conditions may arise. Two methods are
proposed, progressive alignment and profile HMM, both adapted from the multiple biological
sequence alignment task. Experiments demonstrate that these methods can indeed
improve the alignment accuracy and robustness over comparable pairwise methods.
Secondly, this thesis presents a score to performance alignment method that can improve
the robustness in cases where some musical voices, such as the melody, are played asynchronously
to others – a stylistic device used in musical expression. The asynchronies between
the melody and the accompaniment are handled by treating the voices as separate
timelines in a multi-dimensional variant of dynamic time warping (DTW). The method
measurably improves the alignment accuracy for pieces with asynchronous voices and
preserves the accuracy otherwise.
Once an accurate alignment between a score and an audio recording is available, the
score information can be exploited as prior knowledge in automatic music transcription
(AMT), for scenarios where score is available, such as music tutoring. Score-informed dictionary
learning is used to learn the spectral pattern of each pitch that describes the energy
distribution of the associated notes in the recording. More precisely, the dictionary learning
process in non-negative matrix factorization (NMF) is constrained using the aligned
score. This way, by adapting the dictionary to a given recording, the proposed method
improves the accuracy over the state-of-the-art.China Scholarship Council
Pose-invariant, model-based object recognition, using linear combination of views and Bayesian statistics
This thesis presents an in-depth study on the problem of object recognition, and in particular the detection
of 3-D objects in 2-D intensity images which may be viewed from a variety of angles. A solution to this
problem remains elusive to this day, since it involves dealing with variations in geometry, photometry
and viewing angle, noise, occlusions and incomplete data. This work restricts its scope to a particular
kind of extrinsic variation; variation of the image due to changes in the viewpoint from which the object
is seen.
A technique is proposed and developed to address this problem, which falls into the category of
view-based approaches, that is, a method in which an object is represented as a collection of a small
number of 2-D views, as opposed to a generation of a full 3-D model. This technique is based on the
theoretical observation that the geometry of the set of possible images of an object undergoing 3-D rigid
transformations and scaling may, under most imaging conditions, be represented by a linear combination
of a small number of 2-D views of that object. It is therefore possible to synthesise a novel image of an
object given at least two existing and dissimilar views of the object, and a set of linear coefficients that
determine how these views are to be combined in order to synthesise the new image.
The method works in conjunction with a powerful optimization algorithm, to search and recover the
optimal linear combination coefficients that will synthesize a novel image, which is as similar as possible
to the target, scene view. If the similarity between the synthesized and the target images is above some
threshold, then an object is determined to be present in the scene and its location and pose are defined,
in part, by the coefficients. The key benefits of using this technique is that because it works directly
with pixel values, it avoids the need for problematic, low-level feature extraction and solution of the
correspondence problem. As a result, a linear combination of views (LCV) model is easy to construct
and use, since it only requires a small number of stored, 2-D views of the object in question, and the
selection of a few landmark points on the object, the process which is easily carried out during the offline,
model building stage. In addition, this method is general enough to be applied across a variety of
recognition problems and different types of objects.
The development and application of this method is initially explored looking at two-dimensional
problems, and then extending the same principles to 3-D. Additionally, the method is evaluated across
synthetic and real-image datasets, containing variations in the objects’ identity and pose. Future work on
possible extensions to incorporate a foreground/background model and lighting variations of the pixels
are examined
- …