71 research outputs found
Machine learning methods for discriminating natural targets in seabed imagery
The research in this thesis concerns feature-based machine learning processes and methods for discriminating qualitative natural targets in seabed imagery. The applications considered, typically involve time-consuming manual processing stages in an industrial setting. An aim of the research is to facilitate a means of assisting human analysts by expediting the tedious interpretative tasks, using machine methods. Some novel approaches are devised and investigated for solving the application problems.
These investigations are compartmentalised in four coherent case studies linked by common underlying technical themes and methods. The first study addresses pockmark discrimination in a digital bathymetry model. Manual identification and mapping of even a relatively small number of these landform objects is an expensive process. A novel, supervised machine learning approach to automating the task is presented. The process maps the boundaries of ≈ 2000 pockmarks in seconds - a task that would take days for a human analyst to complete. The second case study investigates different feature creation methods for automatically discriminating sidescan sonar image textures characteristic of Sabellaria spinulosa colonisation.
Results from a comparison of several textural feature creation methods on sonar waterfall imagery show that Gabor filter banks yield some of the best results. A further empirical investigation into the filter bank features created on sonar mosaic imagery leads to the identification of a useful configuration and filter parameter ranges for discriminating the target textures in the imagery. Feature saliency estimation is a vital stage in the machine process. Case study three concerns distance measures for the evaluation and ranking of features on sonar imagery. Two novel consensus methods for creating a more robust ranking are proposed. Experimental results show that the consensus methods can improve robustness over a range of feature parameterisations and various seabed texture
classification tasks. The final case study is more qualitative in nature and brings together a number of ideas, applied to the classification of target regions in real-world
sonar mosaic imagery.
A number of technical challenges arose and these were
surmounted by devising a novel, hybrid unsupervised method. This fully automated machine approach was compared with a supervised approach in an application to the problem of image-based sediment type discrimination. The hybrid unsupervised method produces a plausible class map in a few minutes of processing time. It is concluded that the versatile, novel process should be generalisable to the discrimination of other subjective natural targets in real-world seabed imagery, such as Sabellaria textures and pockmarks (with appropriate features and feature tuning.) Further, the full automation
of pockmark and Sabellaria discrimination is feasible within this framework
Automatic Fracture Orientation Extraction from SfM Point Clouds
Geology seeks to understand the history of the Earth and its surface processes through charac- terisation of surface formations and rock units. Chief among the geologists’ tools are rock unit orientation measurements, such as Strike, Dip and Dip Direction. These allow an understanding of both surface and sub-structure on both the local and macro scale.
Although the way these techniques can be used to characterise geology are well understood, the need to collect these measurements by hand adds time and expense to the work of the geologist, precludes spontaneity in field work, and coverage is limited to where the geologist can physically reach.
In robotics and computer vision, multi-view geometry techniques such as Structure from Motion (SfM) allows reconstructions of objects and scenes using multiple camera views. SfM-based techniques provide advantages over Lidar-type techniques, in areas such as cost and flexibility of use in more varied environmental conditions, while sacrificing extreme levels of fidelity. Regardless of this, camera based techniques such as SfM, have developed to the point where accuracy is possible in the decimetre range.
Here is presented a system to automate the measurement of Strike, Dip and Dip Direction using multi-view geometry from video. Rather than deriving measurements using a method applied to the images, such as the Hough Transform, this method takes measurements directly from the software generated point cloud.
Point cloud noise is mitigated using a Mahalanobis distance implementation. Significant structure is characterised using a k-nearest neighbour region growing algorithm, and final surface orientations are quantified using the plane, and normal direction cosines
Feature based dynamic intra-video indexing
A thesis submitted in partial fulfillment for the degree of Doctor of PhilosophyWith the advent of digital imagery and its wide spread application in all vistas of life, it has become an important component in the world of communication. Video content ranging from broadcast news, sports, personal videos, surveillance, movies and entertainment and similar domains is increasing exponentially in quantity and it is becoming a challenge to retrieve content of interest from the corpora. This has led to an increased interest amongst the researchers to investigate concepts of video structure analysis, feature extraction, content annotation, tagging, video indexing, querying and retrieval to fulfil the requirements. However, most of the previous work is confined within specific domain and constrained by the quality, processing and storage capabilities. This thesis presents a novel framework agglomerating the established approaches from feature extraction to browsing in one system of content based video retrieval. The proposed framework significantly fills the gap identified while satisfying the imposed constraints of processing, storage, quality and retrieval times. The output entails a framework, methodology and prototype application to allow the user to efficiently and effectively retrieved content of interest such as age, gender and activity by specifying the relevant query. Experiments have shown plausible results with an average precision and recall of 0.91 and 0.92 respectively for face detection using Haar wavelets based approach. Precision of age ranges from 0.82 to 0.91 and recall from 0.78 to 0.84. The recognition of gender gives better precision with males (0.89) compared to females while recall gives a higher value with females (0.92). Activity of the subject has been detected using Hough transform and classified using Hiddell Markov Model. A comprehensive dataset to support similar studies has also been developed as part of the research process. A Graphical User Interface (GUI) providing a friendly and intuitive interface has been integrated into the developed system to facilitate the retrieval process. The comparison results of the intraclass correlation coefficient (ICC) shows that the performance of the system closely resembles with that of the human annotator. The performance has been optimised for time and error rate
Foetal echocardiographic segmentation
Congenital heart disease affects just under one percentage of all live births [1].
Those defects that manifest themselves as changes to the cardiac chamber volumes
are the motivation for the research presented in this thesis.
Blood volume measurements in vivo require delineation of the cardiac chambers and
manual tracing of foetal cardiac chambers is very time consuming and operator
dependent. This thesis presents a multi region based level set snake deformable
model applied in both 2D and 3D which can automatically adapt to some extent
towards ultrasound noise such as attenuation, speckle and partial occlusion artefacts.
The algorithm presented is named Mumford Shah Sarti Collision Detection (MSSCD).
The level set methods presented in this thesis have an optional shape prior term for
constraining the segmentation by a template registered to the image in the presence
of shadowing and heavy noise.
When applied to real data in the absence of the template the MSSCD algorithm is
initialised from seed primitives placed at the centre of each cardiac chamber. The
voxel statistics inside the chamber is determined before evolution. The MSSCD stops
at open boundaries between two chambers as the two approaching level set fronts
meet. This has significance when determining volumes for all cardiac compartments
since cardiac indices assume that each chamber is treated in isolation. Comparison
of the segmentation results from the implemented snakes including a previous level
set method in the foetal cardiac literature show that in both 2D and 3D on both real
and synthetic data, the MSSCD formulation is better suited to these types of data.
All the algorithms tested in this thesis are within 2mm error to manually traced
segmentation of the foetal cardiac datasets. This corresponds to less than 10% of
the length of a foetal heart. In addition to comparison with manual tracings all the
amorphous deformable model segmentations in this thesis are validated using a
physical phantom. The volume estimation of the phantom by the MSSCD
segmentation is to within 13% of the physically determined volume
Pose-invariant, model-based object recognition, using linear combination of views and Bayesian statistics
This thesis presents an in-depth study on the problem of object recognition, and in particular the detection
of 3-D objects in 2-D intensity images which may be viewed from a variety of angles. A solution to this
problem remains elusive to this day, since it involves dealing with variations in geometry, photometry
and viewing angle, noise, occlusions and incomplete data. This work restricts its scope to a particular
kind of extrinsic variation; variation of the image due to changes in the viewpoint from which the object
is seen.
A technique is proposed and developed to address this problem, which falls into the category of
view-based approaches, that is, a method in which an object is represented as a collection of a small
number of 2-D views, as opposed to a generation of a full 3-D model. This technique is based on the
theoretical observation that the geometry of the set of possible images of an object undergoing 3-D rigid
transformations and scaling may, under most imaging conditions, be represented by a linear combination
of a small number of 2-D views of that object. It is therefore possible to synthesise a novel image of an
object given at least two existing and dissimilar views of the object, and a set of linear coefficients that
determine how these views are to be combined in order to synthesise the new image.
The method works in conjunction with a powerful optimization algorithm, to search and recover the
optimal linear combination coefficients that will synthesize a novel image, which is as similar as possible
to the target, scene view. If the similarity between the synthesized and the target images is above some
threshold, then an object is determined to be present in the scene and its location and pose are defined,
in part, by the coefficients. The key benefits of using this technique is that because it works directly
with pixel values, it avoids the need for problematic, low-level feature extraction and solution of the
correspondence problem. As a result, a linear combination of views (LCV) model is easy to construct
and use, since it only requires a small number of stored, 2-D views of the object in question, and the
selection of a few landmark points on the object, the process which is easily carried out during the offline,
model building stage. In addition, this method is general enough to be applied across a variety of
recognition problems and different types of objects.
The development and application of this method is initially explored looking at two-dimensional
problems, and then extending the same principles to 3-D. Additionally, the method is evaluated across
synthetic and real-image datasets, containing variations in the objects’ identity and pose. Future work on
possible extensions to incorporate a foreground/background model and lighting variations of the pixels
are examined
- …