90 research outputs found

    Automated reliability assessment for spectroscopic redshift measurements

    Get PDF
    We present a new approach to automate the spectroscopic redshift reliability assessment based on machine learning (ML) and characteristics of the redshift probability density function (PDF). We propose to rephrase the spectroscopic redshift estimation into a Bayesian framework, in order to incorporate all sources of information and uncertainties related to the redshift estimation process, and produce a redshift posterior PDF that will be the starting-point for ML algorithms to provide an automated assessment of a redshift reliability. As a use case, public data from the VIMOS VLT Deep Survey is exploited to present and test this new methodology. We first tried to reproduce the existing reliability flags using supervised classification to describe different types of redshift PDFs, but due to the subjective definition of these flags, soon opted for a new homogeneous partitioning of the data into distinct clusters via unsupervised classification. After assessing the accuracy of the new clusters via resubstitution and test predictions, unlabelled data from preliminary mock simulations for the Euclid space mission are projected into this mapping to predict their redshift reliability labels.Comment: Submitted on 02 June 2017 (v1). Revised on 08 September 2017 (v2). Latest version 28 September 2017 (this version v3

    Anomaly and Change Detection in Remote Sensing Images

    Get PDF
    Earth observation through satellite sensors, models and in situ measurements provides a way to monitor our planet with unprecedented spatial and temporal resolution. The amount and diversity of the data which is recorded and made available is ever-increasing. This data allows us to perform crop yield prediction, track land-use change such as deforestation, monitor and respond to natural disasters and predict and mitigate climate change. The last two decades have seen a large increase in the application of machine learning algorithms in Earth observation in order to make efficient use of the growing data-stream. Machine learning algorithms, however, are typically model agnostic and too flexible and so end up not respecting fundamental laws of physics. On the other hand there has, in recent years, been an increase in research attempting to embed physics knowledge in machine learning algorithms in order to obtain interpretable and physically meaningful solutions. The main objective of this thesis is to explore different ways of encoding physical knowledge to provide machine learning methods tailored for specific problems in remote sensing.Ways of expressing expert knowledge about the relevant physical systems in remote sensing abound, ranging from simple relations between reflectance indices and biophysical parameters to complex models that compute the radiative transfer of electromagnetic radiation through our atmosphere, and differential equations that explain the dynamics of key parameters. This thesis focuses on inversion problems, emulation of radiative transfer models, and incorporation of the above-mentioned domain knowledge in machine learning algorithms for remote sensing applications. We explore new methods that can optimally model simulated and in-situ data jointly, incorporate differential equations in machine learning algorithms, handle more complex inversion problems and large-scale data, obtain accurate and computationally efficient emulators that are consistent with physical models, and that efficiently perform approximate Bayesian inversion over radiative transfer models

    Structural and seismic monitoring of historical and contemporary buildings: general principles and applications

    Get PDF
    Structural Health Monitoring (SHM) indicates the continuous or periodic assessment of the conditions of a structure or a set of structures using information from sensor systems, integrated or autonomous, and from any further operation that is aimed at preserving structural integrity. SHM is a broad and multidisciplinary field, both for the spectrum of sciences and technologies involved and for the variety of applications. The technological developments that have made the advancement of this discipline possible come from many fields, including physics, chemistry, materials science, biology, but above all aerospace, civil, electronic and mechanical engineering. The first applications, at the turn of the sixties and seventies, concerned the integrity control of remote structural elements, such as foundation piles and submerged parts of off-shore platforms, but nowadays this type of monitoring is practiced on airplanes, vehicles spacecraft, ships, helicopters, automobiles, bridges, buildings, civil infrastructure, power plants, pipelines, electronic systems, manufacturing and processing facilities, and biological systems. This paper carries out an extensive examination of the theoretical and applicative foundations of structural and seismic monitoring, focusing in particular on methods that exploit natural vibrations and their use both in the diagnosis and in the prediction of the seismic response of civil structures, infrastructure networks, and traditional and modern architectural heritage

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Clustering multivariate and functional data using spatial rank functions

    Get PDF
    In this work, we consider the problem of determining the number of clusters in the multivariate and functional data, where the data are represented by a mixture model in which each component corresponds to a different cluster without any prior knowledge of the number of clusters. For the multivariate case, we propose a new forward search methodology based on spatial ranks. We also propose a modified algorithm based on the volume of central rank regions. Our numerical examples show that it produces the best results under elliptic symmetry and it outperforms the traditional forward search based on Mahalanobis distances. In addition, a new nonparametric multivariate clustering method based on different weighted spatial ranks (WSR) functions is proposed. The WSR are completely data-driven and easy to compute without any need to parameter estimates of the underlying distributions, which make them robust against distributional assumptions. We have considered parametric and nonparametric weights for comparison. We give some numerical examples based on both simulated and real datasets to illustrate the performance of the proposed method. Moreover, we propose two different clustering methods for functional data. The first method is an extension to the forward search based on functional spatial ranks (FSR) that we proposed for the multivariate case. In the second method, we extend the WSR method to the functional data analysis. The proposed weighted functional spatial ranks (WFSR) method is a filtering method based on FPCA. Comparison between the existing methods has been considered. The results showed that the two proposed methods give a competitive and quite reasonable clustering analysis

    A perceptual learning model to discover the hierarchical latent structure of image collections

    Get PDF
    Biology has been an unparalleled source of inspiration for the work of researchers in several scientific and engineering fields including computer vision. The starting point of this thesis is the neurophysiological properties of the human early visual system, in particular, the cortical mechanism that mediates learning by exploiting information about stimuli repetition. Repetition has long been considered a fundamental correlate of skill acquisition andmemory formation in biological aswell as computational learning models. However, recent studies have shown that biological neural networks have differentways of exploiting repetition in forming memory maps. The thesis focuses on a perceptual learning mechanism called repetition suppression, which exploits the temporal distribution of neural activations to drive an efficient neural allocation for a set of stimuli. This explores the neurophysiological hypothesis that repetition suppression serves as an unsupervised perceptual learning mechanism that can drive efficient memory formation by reducing the overall size of stimuli representation while strengthening the responses of the most selective neurons. This interpretation of repetition is different from its traditional role in computational learning models mainly to induce convergence and reach training stability, without using this information to provide focus for the neural representations of the data. The first part of the thesis introduces a novel computational model with repetition suppression, which forms an unsupervised competitive systemtermed CoRe, for Competitive Repetition-suppression learning. The model is applied to generalproblems in the fields of computational intelligence and machine learning. Particular emphasis is placed on validating the model as an effective tool for the unsupervised exploration of bio-medical data. In particular, it is shown that the repetition suppression mechanism efficiently addresses the issues of automatically estimating the number of clusters within the data, as well as filtering noise and irrelevant input components in highly dimensional data, e.g. gene expression levels from DNA Microarrays. The CoRe model produces relevance estimates for the each covariate which is useful, for instance, to discover the best discriminating bio-markers. The description of the model includes a theoretical analysis using Huber’s robust statistics to show that the model is robust to outliers and noise in the data. The convergence properties of themodel also studied. It is shown that, besides its biological underpinning, the CoRe model has useful properties in terms of asymptotic behavior. By exploiting a kernel-based formulation for the CoRe learning error, a theoretically sound motivation is provided for the model’s ability to avoid local minima of its loss function. To do this a necessary and sufficient condition for global error minimization in vector quantization is generalized by extending it to distance metrics in generic Hilbert spaces. This leads to the derivation of a family of kernel-based algorithms that address the local minima issue of unsupervised vector quantization in a principled way. The experimental results show that the algorithm can achieve a consistent performance gain compared with state-of-the-art learning vector quantizers, while retaining a lower computational complexity (linear with respect to the dataset size). Bridging the gap between the low level representation of the visual content and the underlying high-level semantics is a major research issue of current interest. The second part of the thesis focuses on this problem by introducing a hierarchical and multi-resolution approach to visual content understanding. On a spatial level, CoRe learning is used to pool together the local visual patches by organizing them into perceptually meaningful intermediate structures. On the semantical level, it provides an extension of the probabilistic Latent Semantic Analysis (pLSA) model that allows discovery and organization of the visual topics into a hierarchy of aspects. The proposed hierarchical pLSA model is shown to effectively address the unsupervised discovery of relevant visual classes from pictorial collections, at the same time learning to segment the image regions containing the discovered classes. Furthermore, by drawing on a recent pLSA-based image annotation system, the hierarchical pLSA model is extended to process and representmulti-modal collections comprising textual and visual data. The results of the experimental evaluation show that the proposed model learns to attach textual labels (available only at the level of the whole image) to the discovered image regions, while increasing the precision/ recall performance with respect to flat, pLSA annotation model

    On-line learning and anomaly detection methods : applications to fault assessment

    Get PDF
    [Abstract] This work lays at the intersection of two disciplines, Machine Learning (ML) research and predictive maintenance of machinery. On the one hand, Machine Learning aims at detecting patterns in data gathered from phenomena which can be very different in nature. On the other hand, predictive maintenance of industrial machinery is the discipline which, based on the measurement of physical conditions of its internal components, assesses its present and near future condition in order to prevent fatal failures. In this work it is highlighted that these two disciplines can benefit from their synergy. Predictive maintenance is a challenge for Machine Learning algorithms due to the nature of data generated by rotating machinery: (a) each machine constitutes an new individual case so fault data is not available for model construction and (b) working conditions of the machine are changeable in many situations and affects captured data. Machine Learning can help predictive maintenance to: (a) cut plant costs though the automation of tedious periodic tasks which are carried out by experts and (b) reduce the probability of fatal damages in machinery due to the possibility of monitoring it more frequently at a modest cost increase. General purpose ML techniques able to deal with the aforementioned conditions are proposed. Also, its application to the specific field of predictive maintenance of rotating machinery based on vibration signature analysis is thoroughly treated. Since only normal state data is available to model the vibration captures of a machine, we are restricted to the use of anomaly detection algorithms, which will be one of the main blocks of this work. In addition, predictive maintenance also aims at assessing its state in the near future. The second main block of this work, on-line learning algorithms, will help us in this task. A novel on-line learning algorithm for a single layer neural network with a non-linear output function is proposed. In addition to the application to predictive maintenance, the proposed algorithm is able to continuously train a network in a one pattern at a time manner. If some conditions are hold, it analytically ensures to reach a global optimal model. As well as predictive maintenance, the proposed on-line learning algorithm can be applied to scenarios of stream data learning such as big data sets, changing contexts and distributed data. Some of the principles described in this work were introduced in a commercial software prototype, GIDASR ? . This software was developed and installed in real plants as part of the work of this thesis. The experiences in applying ML to fault detection with this software are also described and prove that the proposed methodology can be very effective. Fault detection experiments with simulated and real vibration data are also carried out and demonstrate the performance of the proposed techniques when applied to the problem of predictive maintenance of rotating machinery.[Resumen] La presente tesis doctoral se sitúa en el ámbito de dos disciplinas, la investigación en Aprendizaje Computacional (AC) y el Mantenimiento Predictivo (MP) de maquinaria rotativa. Por una parte, el AC estudia la problemática de detectar y clasificar patrones en conjuntos de datos extraídos de fenómenos de interés de la más variada naturaleza. Por su parte, el MP es la disciplina que, basándose en la monitorización de variables físicas de los componentes internos de maquinaria industrial, se encarga de valorar las condiciones de éstos tanto en el momento presente como en un futuro próximo con el fin último de prevenir roturas que pueden resultar de fatales consecuencias. En este trabajo se pone de relevancia que ambas disciplinas pueden beneficiarse de su sinergia. El MP supone un reto para el AC debido a la naturaleza de los datos generados por la maquinaria: (a) las propiedades de las medidas físicas recogidas varían para cada máquina y, debido a que la monitorización debe comenzar en condiciones correctas, no contamos con datos de fallos para construir un modelo de comportamiento y (b) las condiciones de funcionamiento de las máquinas pueden ser variables y afectar a los datos generados por éstas. El AC puede ayudar al MP a: (a) reducir costes a través de la automatización de tareas periódicas tediosas que tienen que ser realizadas por expertos en el área y (b) reducir la probabilidad de grandes da˜nos a la maquinaria gracias a la posibilidad de monitorizarla con una mayor frecuencia sin elevar los costes sustancialmente. En este trabajo, se proponen algoritmos de AC de propósito general capaces de trabajar en las condiciones anteriores. Además, su aplicación específica al campo del mantenimiento predictivo de maquinaria rotativa basada en el análisis de vibraciones se estudia en detalle, aportando resultados para casos reales. El hecho de disponer sólamente de datos en condiciones de normalidad de la maquinaria nos restringe al uso de técnicas de detección de anomalías. éste será uno de los bloques principales del presente trabajo. Por otra parte, el MP también intenta valorar si la maquinaria se encontrará en un estado inaceptable en un futuro próximo. En el segundo bloque se presenta un nuevo algoritmo de aprendizaje en tiempo real (on-line) que será de gran ayuda en esta tarea. Se propone un nuevo algoritmo de aprendizaje on-line para una red neuronas monocapa con función de transferencia no lineal. Además de su aplicación al mantenimiento predictivo, el algoritmo propuesto puede ser empleado en otros escenarios de aprendizaje on-line como grandes conjuntos de datos, cambios de contexto o datos distribuidos. Algunas de las ideas descritas en este trabajo fueron implantadas en un prototipo de software comercial, GIDASR ? . Este software fue desarrollado e implantado en plantas reales por el autor de este trabajo y las experiencias extraídas de su aplicación también se describen en el presente volumen[Resumo] O presente traballo sitúase no ámbito de dúas disciplinas, a investigación en Aprendizaxe Computacional (AC) e o Mantemento Predictivo (MP) de maquinaria rotativa. Por unha banda, o AC estuda a problemática de detectar e clasificar patróns en conxuntos de datos extraídos de fenómenos de interese da máis variada natureza. Pola súa banda, o MP é a disciplina que, baseándose na monitorización de variables físicas dos seus compo˜nentes internos, encárgase de valorar as condicións destes tanto no momento presente como nun futuro próximo co fin último de previr roturas que poden resultar de fatais consecuencias. Neste traballo ponse de relevancia que ambas disciplinas poden beneficiarse da súa sinergia. O MP supón un reto para o AC debido á natureza dos datos xerados pola maquinaria: (a) as propiedades das medidas físicas recolleitas varían para cada máquina e, debido a que a monitorización debe comezar en condicións correctas, non contamos con datos de fallos para construír un modelo de comportamento e (b) as condicións de funcionamento das máquinas poden ser variables e afectar aos datos xerados por estas. O AC pode axudar ao MP a: (a) reducir custos a través da automatización de tarefas periódicas tediosas que te˜nen que ser realizadas por expertos no área e (b) reducir a probabilidade de grandes danos na maquinaria grazas á posibilidade de monitorizala cunha maior frecuencia sen elevar os custos sustancialmente. Neste traballo, propó˜nense algoritmos de AC de propósito xeral capaces de traballar nas condicións anteriores. Ademais, a súa aplicación específica ao campo do mantemento predictivo de maquinaria rotativa baseada na análise de vibracións estúdase en detalle aportando resultados para casos reais. Debido a contar só con datos en condicións de normalidade da maquinaria, estamos restrinxidos ao uso de técnicas de detección de anomalías. éste será un dos bloques principais do presente traballo. Por outra banda, o MP tamén intenta valorar si a maquinaria atoparase nun estado inaceptable nun futuro próximo. No segundo bloque do presente traballo preséntase un novo algoritmo de aprendizaxe en tempo real (on-line) que será de gran axuda nesta tarefa. Proponse un novo algoritmo de aprendizaxe on-line para unha rede neuronas monocapa con función de transferencia non lineal. Ademais da súa aplicación ao mantemento predictivo, o algoritmo proposto pode ser empregado en escenarios de aprendizaxe on-line como grandes conxuntos de datos, cambios de contexto ou datos distribuídos. Algunhas das ideas descritas neste traballo foron implantadas nun prototipo de software comercial, GIDASR ? . Este software foi desenvolvido e implantado en plantas reais polo autor deste traballo e as experiencias extraídas da súa aplicación tamén se describen no presente volume

    Learning by correlation for computer vision applications: from Kernel methods to deep learning

    Get PDF
    Learning to spot analogies and differences within/across visual categories is an arguably powerful approach in machine learning and pattern recognition which is directly inspired by human cognition. In this thesis, we investigate a variety of approaches which are primarily driven by correlation and tackle several computer vision applications

    A machine learning approach to the unsupervised segmentation of mitochondria in subcellular electron microscopy data

    Get PDF
    Recent advances in cellular and subcellular microscopy demonstrated its potential towards unravelling the mechanisms of various diseases at the molecular level. The biggest challenge in both human- and computer-based visual analysis of micrographs is the variety of nanostructures and mitochondrial morphologies. The state-of-the-art is, however, dominated by supervised manual data annotation and early attempts to automate the segmentation process were based on supervised machine learning techniques which require large datasets for training. Given a minimal number of training sequences or none at all, unsupervised machine learning formulations, such as spectral dimensionality reduction, are known to be superior in detecting salient image structures. This thesis presents three major contributions developed around the spectral clustering framework which is proven to capture perceptual organization features. Firstly, we approach the problem of mitochondria localization. We propose a novel grouping method for the extracted line segments which describes the normal mitochondrial morphology. Experimental findings show that the clusters obtained successfully model the inner mitochondrial membrane folding and therefore can be used as markers for the subsequent segmentation approaches. Secondly, we developed an unsupervised mitochondria segmentation framework. This method follows the evolutional ability of human vision to extrapolate salient membrane structures in a micrograph. Furthermore, we designed robust non-parametric similarity models according to Gestaltic laws of visual segregation. Experiments demonstrate that such models automatically adapt to the statistical structure of the biological domain and return optimal performance in pixel classification tasks under the wide variety of distributional assumptions. The last major contribution addresses the computational complexity of spectral clustering. Here, we introduced a new anticorrelation-based spectral clustering formulation with the objective to improve both: speed and quality of segmentation. The experimental findings showed the applicability of our dimensionality reduction algorithm to very large scale problems as well as asymmetric, dense and non-Euclidean datasets
    corecore