21 research outputs found

    Harmonic Analysis Inspired Data Fusion for Applications in Remote Sensing

    Get PDF
    This thesis will address the fusion of multiple data sources arising in remote sensing, such as hyperspectral and LIDAR. Fusing of multiple data sources provides better data representation and classification results than any of the independent data sources would alone. We begin our investigation with the well-studied Laplacian Eigenmap (LE) algorithm. This algorithm offers a rich template to which fusion concepts can be added. For each phase of the LE algorithm (graph, operator, and feature space) we develop and test different data fusion techniques. We also investigate how partially labeled data and approximate LE preimages can used to achieve data fusion. Lastly, we study several numerical acceleration techniques that can be used to augment the developed algorithms, namely the Nystrom extension, Random Projections, and Approximate Neighborhood constructions. The Nystrom extension is studied in detail and the application of Frame Theory and Sigma-Delta Quantization is proposed to enrich the Nystrom extension

    Robust hyperspectral image reconstruction for scene simulation applications

    Get PDF
    This thesis presents the development of a spectral reconstruction method for multispectral (MSI) and hyperspectral (HSI) applications through an enhanced dictionary learning and spectral unmixing methodologies. Earth observation/surveillance is largely undertaken by MSI sensing such as that given by the Landsat, WorldView, Sentinel etc, however, the practical usefulness of the MSI data set is very limited. This is mainly because of the very limited number of wave bands that can be provided by the MSI imagery. One means to remedy this major shortcoming is to extend the MSI into HSI without the need of involving expensive hardware investment. Specifically, spectral reconstruction has been one of the most critical elements in applications such as Hyperspectral scene simulation. Hyperspectral scene simulation has been an important technique particularly for defence applications. Scene simulation creates a virtual scene such that modelling of the materials in the scene can be tailored freely to allow certain parameters of the model to be studied. In the defence sector this is the most cost-effective technique to allow the vulnerability of the soldiers/vehicles to be evaluated before they are deployed to a foreign ground. The simulation of a hyperspectral scene requires the details of materials in the scene, which is normally not available. Current state-of-the-art technology is trying to make use of the MSI satellite data, and to transform it into HSI for the hyperspectral scene simulation. One way to achieve this is through a reconstruction algorithm, commonly known as spectral reconstruction, which turns the MSI into HSI using an optimisation approach. The methodology that has been adopted in this thesis is the development of a robust dictionary learning to estimate the endmember (EM) robustly. Once the EM is found the abundance of materials in the scene can be subsequently estimated through a linear unmixing approach. Conventional approaches to the material allocation of most Hyperspectral scene simulator has been using the Texture Material Mapper (TMM) algorithm, which allocates materials from a spectral library (a collection of pre-compiled endmember iii iv materials) database according to the minimum spectral Euclidean distance difference to a candidate pixel of the scene. This approach has been shown (in this work) to be highly inaccurate with large scene reconstruction error. This research attempts to use a dictionary learning technique for material allocation, solving it as an optimisation problem with the objective of: (i) to reconstruct the scene as closely as possible to the ground truth with a fraction of error as that given by the TMM method, and (ii) to learn materials which are trace (2-3 times the number of species (i.e. intrinsic dimension) in the scene) cluster to ensure all material species in the scene is included for the scene reconstruction. Furthermore, two approaches complementing the goals of the learned dictionary through a rapid orthogonal matching pursuit (r-OMP) which enhances the performance of the orthogonal matching pursuit algorithm; and secondly a semi-blind approximation of the irradiance of all pixels in the scene including those in the shaded regions, have been proposed in this work. The main result of this research is the demonstration of the effectiveness of the proposed algorithms using real data set. The SCD-SOMP has been shown capable to learn both the background and trace materials even for a dictionary with small number of atoms (≈10). Also, the KMSCD method is found to be the more versatile with overcomplete (non-orthogonal) dictionary capable to learn trace materials with high scene reconstruction accuracy (2x of accuracy enhancement over that simulated using the TMM method. Although this work has achieved an incremental improvement in spectral reconstruction, however, the need of dictionary training using hyperspectral data set in this thesis has been identified as one limitation which is needed to be removed for the future direction of research

    A computationally efficient symmetric diagonally dominant matrix projection-based Gaussian process approach

    Get PDF
    Although kernel approximation methods have been widely applied to mitigate the O(n3) cost of the n × n kernel matrix inverse in Gaussian process methods, they still face computational challenges. The ‘residual’ matrix between the covariance and the approximating component is often discarded as it prevents the computational cost reduction. In this paper, we propose a computationally efficient Gaussian process approach that achieves better computational efficiency, O(mn2), compared with standard Gaussian process methods, when using m n data. The proposed approach incorporates the ‘residual’ matrix in its symmetric diagonally dominant form which can be further approximated by the Neumann series. We have validated and compared the approach with full Gaussian process approaches and kernel approximation based Gaussian process variants, both on synthetic and real air quality data

    Scalable Data Mining via Constrained Low Rank Approximation

    Get PDF
    Matrix and tensor approximation methods are recognised as foundational tools for modern data analytics. Their strength lies in their long history of rigorous and principled theoretical foundations, judicious formulations via various constraints, along with the availability of fast computer programs. Multiple Constrained Low Rank Approximation (CLRA) formulations exist for various commonly encountered tasks like clustering, dimensionality reduction, anomaly detection, amongst others. The primary challenge in modern data analytics is the sheer volume of data to be analysed, often requiring multiple machines to just hold the dataset in memory. This dissertation presents CLRA as a key enabler of scalable data mining in distributed-memory parallel machines. Nonnegative Matrix Factorisation (NMF) is the primary CLRA method studied in this dissertation. NMF imposes nonnegativity constraints on the factor matrices and is a well studied formulation known for its simplicity, interpretability, and clustering prowess. The major bottleneck in most NMF algorithms is a distributed matrix-multiplication kernel. We develop the Parallel Low rank Approximation with Nonnegativity Constraints (PLANC) software package, building on the earlier MPI-FAUN library, which includes an efficient matrix-multiplication kernel tailored to the CLRA case. It employs carefully designed parallel algorithms and data distributions to avoid unnecessary computation and communication. We extend PLANC to include several optimised Nonnegative Least-Squares (NLS) solvers and symmetric constraints, effectively employing the optimised matrix-multiplication kernel. We develop a parallel inexact Gauss-Newton algorithm for Symmetric Nonnegative Matrix Factorisation (SymNMF). In particular PLANC is able to efficiently utilise second-order information when imposing symmetry constraints without incurring the prohibitive memory and computational costs associated with these methods. We are able to observe 70% efficiency while scaling up these methods. We develop new parallel algorithms for fusing and analysing data with multiple modalities in the Joint Nonnegative Matrix Factorisation (JointNMF) context. JointNMF is capable of knowledge discovery when both feature-data and data-data information is present in a data source. We extend PLANC to handle this case of simultaneously approximating two different large input matrices and study the various trade-offs encountered in the bottleneck matrix-multiplication kernel. We show that these ideas translate naturally to the multilinear setting when data is presented in the form of a tensor. A bottleneck computation analogous to the matrix-multiply, the Matricised-Tensor Times Khatri-Rao Product (MTTKRP) kernel, is implemented. We conclude by describing some avenues for future research which extend the work and ideas in this dissertation. In particular, we consider the notion of structured sparsity, where the user has some control over the nonzero pattern, which appears in computations for various tasks like cross-validation, working with missing values, robust CLRA models, and in the semi-supervised setting.Ph.D

    Uncertainty Quantification in Machine Learning for Engineering Design and Health Prognostics: A Tutorial

    Full text link
    On top of machine learning models, uncertainty quantification (UQ) functions as an essential layer of safety assurance that could lead to more principled decision making by enabling sound risk assessment and management. The safety and reliability improvement of ML models empowered by UQ has the potential to significantly facilitate the broad adoption of ML solutions in high-stakes decision settings, such as healthcare, manufacturing, and aviation, to name a few. In this tutorial, we aim to provide a holistic lens on emerging UQ methods for ML models with a particular focus on neural networks and the applications of these UQ methods in tackling engineering design as well as prognostics and health management problems. Toward this goal, we start with a comprehensive classification of uncertainty types, sources, and causes pertaining to UQ of ML models. Next, we provide a tutorial-style description of several state-of-the-art UQ methods: Gaussian process regression, Bayesian neural network, neural network ensemble, and deterministic UQ methods focusing on spectral-normalized neural Gaussian process. Established upon the mathematical formulations, we subsequently examine the soundness of these UQ methods quantitatively and qualitatively (by a toy regression example) to examine their strengths and shortcomings from different dimensions. Then, we review quantitative metrics commonly used to assess the quality of predictive uncertainty in classification and regression problems. Afterward, we discuss the increasingly important role of UQ of ML models in solving challenging problems in engineering design and health prognostics. Two case studies with source codes available on GitHub are used to demonstrate these UQ methods and compare their performance in the life prediction of lithium-ion batteries at the early stage and the remaining useful life prediction of turbofan engines

    Modélisation tridimensionnelle précise de l'environnement à l’aide des systèmes de photogrammétrie embarqués sur drones

    Get PDF
    Abstract : Images acquired from unmanned aerial vehicles (UAVs) can provide data with unprecedented spatial and temporal resolution for three-dimensional (3D) modeling. Solutions developed for this purpose are mainly operating based on photogrammetry concepts, namely UAV-Photogrammetry Systems (UAV-PS). Such systems are used in applications where both geospatial and visual information of the environment is required. These applications include, but are not limited to, natural resource management such as precision agriculture, military and police-related services such as traffic-law enforcement, precision engineering such as infrastructure inspection, and health services such as epidemic emergency management. UAV-photogrammetry systems can be differentiated based on their spatial characteristics in terms of accuracy and resolution. That is some applications, such as precision engineering, require high-resolution and high-accuracy information of the environment (e.g. 3D modeling with less than one centimeter accuracy and resolution). In other applications, lower levels of accuracy might be sufficient, (e.g. wildlife management needing few decimeters of resolution). However, even in those applications, the specific characteristics of UAV-PSs should be well considered in the steps of both system development and application in order to yield satisfying results. In this regard, this thesis presents a comprehensive review of the applications of unmanned aerial imagery, where the objective was to determine the challenges that remote-sensing applications of UAV systems currently face. This review also allowed recognizing the specific characteristics and requirements of UAV-PSs, which are mostly ignored or not thoroughly assessed in recent studies. Accordingly, the focus of the first part of this thesis is on exploring the methodological and experimental aspects of implementing a UAV-PS. The developed system was extensively evaluated for precise modeling of an open-pit gravel mine and performing volumetric-change measurements. This application was selected for two main reasons. Firstly, this case study provided a challenging environment for 3D modeling, in terms of scale changes, terrain relief variations as well as structure and texture diversities. Secondly, open-pit-mine monitoring demands high levels of accuracy, which justifies our efforts to improve the developed UAV-PS to its maximum capacities. The hardware of the system consisted of an electric-powered helicopter, a high-resolution digital camera, and an inertial navigation system. The software of the system included the in-house programs specifically designed for camera calibration, platform calibration, system integration, onboard data acquisition, flight planning and ground control point (GCP) detection. The detailed features of the system are discussed in the thesis, and solutions are proposed in order to enhance the system and its photogrammetric outputs. The accuracy of the results was evaluated under various mapping conditions, including direct georeferencing and indirect georeferencing with different numbers, distributions and types of ground control points. Additionally, the effects of imaging configuration and network stability on modeling accuracy were assessed. The second part of this thesis concentrates on improving the techniques of sparse and dense reconstruction. The proposed solutions are alternatives to traditional aerial photogrammetry techniques, properly adapted to specific characteristics of unmanned, low-altitude imagery. Firstly, a method was developed for robust sparse matching and epipolar-geometry estimation. The main achievement of this method was its capacity to handle a very high percentage of outliers (errors among corresponding points) with remarkable computational efficiency (compared to the state-of-the-art techniques). Secondly, a block bundle adjustment (BBA) strategy was proposed based on the integration of intrinsic camera calibration parameters as pseudo-observations to Gauss-Helmert model. The principal advantage of this strategy was controlling the adverse effect of unstable imaging networks and noisy image observations on the accuracy of self-calibration. The sparse implementation of this strategy was also performed, which allowed its application to data sets containing a lot of tie points. Finally, the concepts of intrinsic curves were revisited for dense stereo matching. The proposed technique could achieve a high level of accuracy and efficiency by searching only through a small fraction of the whole disparity search space as well as internally handling occlusions and matching ambiguities. These photogrammetric solutions were extensively tested using synthetic data, close-range images and the images acquired from the gravel-pit mine. Achieving absolute 3D mapping accuracy of 11±7 mm illustrated the success of this system for high-precision modeling of the environment.Résumé : Les images acquises à l’aide d’aéronefs sans pilote (ASP) permettent de produire des données de résolutions spatiales et temporelles uniques pour la modélisation tridimensionnelle (3D). Les solutions développées pour ce secteur d’activité sont principalement basées sur des concepts de photogrammétrie et peuvent être identifiées comme des systèmes photogrammétriques embarqués sur aéronefs sans pilote (SP-ASP). Ils sont utilisés dans plusieurs applications environnementales où l’information géospatiale et visuelle est essentielle. Ces applications incluent notamment la gestion des ressources naturelles (ex. : agriculture de précision), la sécurité publique et militaire (ex. : gestion du trafic), les services d’ingénierie (ex. : inspection de bâtiments) et les services de santé publique (ex. : épidémiologie et gestion des risques). Les SP-ASP peuvent être subdivisés en catégories selon les besoins en termes de précision et de résolution. En effet, dans certains cas, tel qu’en ingénierie, l’information sur l’environnement doit être de haute précision et de haute résolution (ex. : modélisation 3D avec une précision et une résolution inférieure à un centimètre). Pour d’autres applications, tel qu’en gestion de la faune sauvage, des niveaux de précision et de résolution moindres peut être suffisants (ex. : résolution de l’ordre de quelques décimètres). Cependant, même dans ce type d’applications les caractéristiques des SP-ASP devraient être prises en considération dans le développement des systèmes et dans leur utilisation, et ce, pour atteindre les résultats visés. À cet égard, cette thèse présente une revue exhaustive des applications de l’imagerie aérienne acquise par ASP et de déterminer les challenges les plus courants. Cette étude a également permis d’établir les caractéristiques et exigences spécifiques des SP-ASP qui sont généralement ignorées ou partiellement discutées dans les études récentes. En conséquence, la première partie de cette thèse traite des aspects méthodologiques et d’expérimentation de la mise en place d’un SP-ASP. Le système développé a été évalué pour la modélisation précise d’une gravière et utilisé pour réaliser des mesures de changement volumétrique. Cette application a été retenue pour deux raisons principales. Premièrement, ce type de milieu fournit un environnement difficile pour la modélisation, et ce, en termes de changement d’échelle, de changement de relief du terrain ainsi que la grande diversité de structures et de textures. Deuxièment, le suivi de mines à ciel ouvert exige un niveau de précision élevé, ce qui justifie les efforts déployés pour mettre au point un SP-ASP de haute précision. Les composantes matérielles du système consistent en un ASP à propulsion électrique de type hélicoptère, d’une caméra numérique à haute résolution ainsi qu’une station inertielle. La composante logicielle est composée de plusieurs programmes développés particulièrement pour calibrer la caméra et la plateforme, intégrer les systèmes, enregistrer les données, planifier les paramètres de vol et détecter automatiquement les points de contrôle au sol. Les détails complets du système sont abordés dans la thèse et des solutions sont proposées afin d’améliorer le système et la qualité des données photogrammétriques produites. La précision des résultats a été évaluée sous diverses conditions de cartographie, incluant le géoréférencement direct et indirect avec un nombre, une répartition et des types de points de contrôle variés. De plus, les effets de la configuration des images et la stabilité du réseau sur la précision de la modélisation ont été évalués. La deuxième partie de la thèse porte sur l’amélioration des techniques de reconstruction éparse et dense. Les solutions proposées sont des alternatives aux techniques de photogrammétrie aérienne traditionnelle et adaptée aux caractéristiques particulières de l’imagerie acquise à basse altitude par ASP. Tout d’abord, une méthode robuste de correspondance éparse et d’estimation de la géométrie épipolaire a été développée. L’élément clé de cette méthode est sa capacité à gérer le pourcentage très élevé des valeurs aberrantes (erreurs entre les points correspondants) avec une efficacité de calcul remarquable en comparaison avec les techniques usuelles. Ensuite, une stratégie d’ajustement de bloc basée sur l’intégration de pseudoobservations du modèle Gauss-Helmert a été proposée. Le principal avantage de cette stratégie consistait à contrôler les effets négatifs du réseau d’images instable et des images bruitées sur la précision de l’autocalibration. Une implémentation éparse de cette stratégie a aussi été réalisée, ce qui a permis de traiter des jeux de données contenant des millions de points de liaison. Finalement, les concepts de courbes intrinsèques ont été revisités pour l’appariement stéréo dense. La technique proposée pourrait atteindre un haut niveau de précision et d’efficacité en recherchant uniquement dans une petite portion de l’espace de recherche des disparités ainsi qu’en traitant les occlusions et les ambigüités d’appariement. Ces solutions photogrammétriques ont été largement testées à l’aide de données synthétiques, d’images à courte portée ainsi que celles acquises sur le site de la gravière. Le système a démontré sa capacité a modélisation dense de l’environnement avec une très haute exactitude en atteignant une précision 3D absolue de l’ordre de 11±7 mm

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Distance measures and whitening procedures for high dimensional data

    Get PDF
    The need to effectively analyse high dimensional data is increasingly crucial to many fields as data collection and storage capabilities continue to grow. Working with high dimensional data is fraught with difficulties, making many data analysis methods inadvisable, unstable or entirely unavailable. The Mahalanobis distance and data whitening are two methods that are integral to multivariate data analysis. These methods are reliant on the inverse of the covariance matrix, which is often non-existent or unstable in high dimensions. The methods that are currently used to circumvent singularity in the covariance matrix often impose structural assumptions on the data, which are not always appropriate or known. In this thesis, three novel methods are proposed. Two of these methods are distance measures which measure the proximity of a point x to a set of points X. The simplicial distances find the average volume of all k-dimensional simplices between x and vertices of X. The minimal-variance distances aim to minimize the variance of the distances produced, while adhering to a constraint ensuring similar behaviour to the Mahalanobis distance. Finally, the minimal-variance whitening method is detailed. This is a method of data whitening, and is constructed by minimizing the total variation of the transformed data subject to a constraint. All of these novel methods are shown to behave similarly to the Mahalanobis distances and data whitening methods that are used for full-rank data. Furthermore, unlike the methods that rely on the inverse covariance matrix, these new methods are well-defined for degenerate data and do not impose structural assumptions. This thesis explores the aims, constructions and limitations of these new methods, and offers many empirical examples and comparisons of their performances when used with high dimensional data
    corecore