127 research outputs found

    Statistical signal processing of nonstationary tensor-valued data

    Get PDF
    Real-world signals, such as the evolution of three-dimensional vector fields over time, can exhibit highly structured probabilistic interactions across their multiple constitutive dimensions. This calls for analysis tools capable of directly capturing the inherent multi-way couplings present in such data. Yet, current analyses typically employ multivariate matrix models and their associated linear algebras which are agnostic to the global data structure and can only describe local linear pairwise relationships between data entries. To address this issue, this thesis uses the property of linear separability -- a notion intrinsic to multi-dimensional data structures called tensors -- as a linchpin to consider the probabilistic, statistical and spectral separability under one umbrella. This helps to both enhance physical meaning in the analysis and reduce the dimensionality of tensor-valued problems. We first introduce a new identifiable probability distribution which appropriately models the interactions between random tensors, whereby linear relationships are considered between tensor fibres as opposed to between individual entries as in standard matrix analysis. Unlike existing models, the proposed tensor probability distribution formulation is shown to yield a unique maximum likelihood estimator which is demonstrated to be statistically efficient. Both matrices and vectors are lower-order tensors, and this gives us a unique opportunity to consider some matrix signal processing models under the more powerful framework of multilinear tensor algebra. By introducing a model for the joint distribution of multiple random tensors, it is also possible to treat random tensor regression analyses and subspace methods within a unified separability framework. Practical utility of the proposed analysis is demonstrated through case studies over synthetic and real-world tensor-valued data, including the evolution over time of global atmospheric temperatures and international interest rates. Another overarching theme in this thesis is the nonstationarity inherent to real-world signals, which typically consist of both deterministic and stochastic components. This thesis aims to help bridge the gap between formal probabilistic theory of stochastic processes and empirical signal processing methods for deterministic signals by providing a spectral model for a class of nonstationary signals, whereby the deterministic and stochastic time-domain signal properties are designated respectively by the first- and second-order moments of the signal in the frequency domain. By virtue of the assumed probabilistic model, novel tests for nonstationarity detection are devised and demonstrated to be effective in low-SNR environments. The proposed spectral analysis framework, which is intrinsically complex-valued, is facilitated by augmented complex algebra in order to fully capture the joint distribution of the real and imaginary parts of complex random variables, using a compact formulation. Finally, motivated by the need for signal processing algorithms which naturally cater for the nonstationarity inherent to real-world tensors, the above contributions are employed simultaneously to derive a general statistical signal processing framework for nonstationary tensors. This is achieved by introducing a new augmented complex multilinear algebra which allows for a concise description of the multilinear interactions between the real and imaginary parts of complex tensors. These contributions are further supported by new physically meaningful empirical results on the statistical analysis of nonstationary global atmospheric temperatures.Open Acces

    An Examination of Some Signi cant Approaches to Statistical Deconvolution

    No full text
    We examine statistical approaches to two significant areas of deconvolution - Blind Deconvolution (BD) and Robust Deconvolution (RD) for stochastic stationary signals. For BD, we review some major classical and new methods in a unified framework of nonGaussian signals. The first class of algorithms we look at falls into the class of Minimum Entropy Deconvolution (MED) algorithms. We discuss the similarities between them despite differences in origins and motivations. We give new theoretical results concerning the behaviour and generality of these algorithms and give evidence of scenarios where they may fail. In some cases, we present new modifications to the algorithms to overcome these shortfalls. Following our discussion on the MED algorithms, we next look at a recently proposed BD algorithm based on the correntropy function, a function defined as a combination of the autocorrelation and the entropy functiosn. We examine its BD performance when compared with MED algorithms. We find that the BD carried out via correntropy-matching cannot be straightforwardly interpreted as simultaneous moment-matching due to the breakdown of the correntropy expansion in terms of moments. Other issues such as maximum/minimum phase ambiguity and computational complexity suggest that careful attention is required before establishing the correntropy algorithm as a superior alternative to the existing BD techniques. For the problem of RD, we give a categorisation of different kinds of uncertainties encountered in estimation and discuss techniques required to solve each individual case. Primarily, we tackle the overlooked cases of robustification of deconvolution filters based on estimated blurring response or estimated signal spectrum. We do this by utilising existing methods derived from criteria such as minimax MSE with imposed uncertainty bands and penalised MSE. In particular, we revisit the Modified Wiener Filter (MWF) which offers simplicity and flexibility in giving improved RDs to the standard plug-in Wiener Filter (WF)

    Sparse feature learning for image analysis in segmentation, classification, and disease diagnosis.

    Get PDF
    The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep models, and Alzheimer\u27s disease classification. Nonnegative Matrix Factorization, Autoencoder and 3D Convolutional Autoencoder are used as architectures or models for unsupervised feature learning. They are investigated along with nonnegativity, sparsity and part-based representation constraints for generalized and transferable feature extraction

    Many-objectives optimization: a machine learning approach for reducing the number of objectives

    Get PDF
    Solving real-world multi-objective optimization problems using Multi-Objective Optimization Algorithms becomes difficult when the number of objectives is high since the types of algorithms generally used to solve these problems are based on the concept of non-dominance, which ceases to work as the number of objectives grows. This problem is known as the curse of dimensionality. Simultaneously, the existence of many objectives, a characteristic of practical optimization problems, makes choosing a solution to the problem very difficult. Different approaches are being used in the literature to reduce the number of objectives required for optimization. This work aims to propose a machine learning methodology, designated by FS-OPA, to tackle this problem. The proposed methodology was assessed using DTLZ benchmarks problems suggested in the literature and compared with similar algorithms, showing a good performance. In the end, the methodology was applied to a difficult real problem in polymer processing, showing its effectiveness. The algorithm proposed has some advantages when compared with a similar algorithm in the literature based on machine learning (NL-MVU-PCA), namely, the possibility for establishing variable–variable and objective–variable relations (not only objective–objective), and the elimination of the need to define/chose a kernel neither to optimize algorithm parameters. The collaboration with the DM(s) allows for the obtainment of explainable solutions.This research was funded by POR Norte under the PhD Grant PRT/BD/152192/2021. The authors also acknowledge the funding by FEDER funds through the COMPETE 2020 Programme and National Funds through FCT (Portuguese Foundation for Science and Technology) under the projects UIDB/05256/2020, and UIDP/05256/2020, the Center for Mathematical Sciences Applied to Industry (CeMEAI) and the support from the São Paulo Research Foundation (FAPESP grant No 2013/07375-0, the Center for Artificial Intelligence (C4AI-USP), the support from the São Paulo Research Foundation (FAPESP grant No 2019/07665-4) and the IBM Corporation

    Relevant data representation by a Kernel-based framework

    Get PDF
    Nowadays, the analysis of a large amount of data has emerged as an issue of great interest taking increasing place in the scientific community, especially in automation, signal processing, pattern recognition, and machine learning. In this sense, the identification, description, classification, visualization, and clustering of events or patterns are important problems for engineering developments and scientific issues, such as biology, medicine, economy, artificial vision, artificial intelligence, and industrial production. Nonetheless, it is difficult to interpret the available information due to its complexity and a large amount of obtained features. In addition, the analysis of the input data requires the development of methodologies that allow to reveal the relevant behaviors of the studied process, particularly, when such signals contain hidden structures varying over a given domain, e.g., space and/or time. When the analyzed signal contains such kind of properties, directly applying signal processing and machine learning procedures without considering a suitable model that deals with both the statistical distribution and the data structure, can lead in unstable performance results. Regarding this, kernel functions appear as an alternative approach to address the aforementioned issues by providing flexible mathematical tools that allow enhancing data representation for supporting signal processing and machine learning systems. Moreover, kernelbased methods are powerful tools for developing better-performing solutions by adapting the kernel to a given problem, instead of learning data relationships from explicit raw vector representations. However, building suitable kernels requires some user prior knowledge about input data, which is not available in most of the practical cases. Furthermore, using the definitions of traditional kernel methods directly, possess a challenging estimation problem that often leads to strong simplifications that restrict the kind of representation that we can use on the data. In this study, we propose a data representation framework based on kernel methods to learn automatically relevant sample relationships in learning systems. Namely, the proposed framework is divided into five kernel-based approaches, which aim to compute relevant data representations by adapting them according to both the imposed sample relationships constraints and the learning scenario (unsupervised or supervised task). First, we develop a kernel-based representation approach that allows revealing the main input sample relations by including relevant data structures using graph-based sparse constraints. Thus, salient data structures are highlighted aiming to favor further unsupervised clustering stages. This approach can be viewed as a graph pruning strategy within a spectral clustering framework which allows enhancing both the local and global data consistencies for a given input similarity matrix. Second, we introduce a kernel-based representation methodology that captures meaningful data relations in terms of their statistical distribution. Thus, an information theoretic learning (ITL) based penalty function is introduced to estimate a kernel-based similarity that maximizes the whole information potential variability. So, we seek for a reproducing kernel Hilbert space (RKHS) that spans the widest information force magnitudes among data points to support further clustering stages. Third, an entropy-like functional on positive definite matrices based on Renyi’s definition is adapted to develop a kernel-based representation approach which considers the statistical distribution and the salient data structures. Thereby, relevant input patterns are highlighted in unsupervised learning tasks. Particularly, the introduced approach is tested as a tool to encode relevant local and global input data relationships in dimensional reduction applications. Fourth, a supervised kernel-based representation is introduced via a metric learning procedure in RKHS that takes advantage of the user-prior knowledge, when available, regarding the studied learning task. Such an approach incorporates the proposed ITL-based kernel functional estimation strategy to adapt automatically the relevant representation using both the supervised information and the input data statistical distribution. As a result, relevant sample dependencies are highlighted by weighting the input features that mostly encode the supervised learning task. Finally, a new generalized kernel-based measure is proposed by taking advantage of different RKHSs. In this way, relevant dependencies are highlighted automatically by considering the input data domain-varying behavior and the user-prior knowledge (supervised information) when available. The proposed measure is an extension of the well-known crosscorrentropy function based on Hilbert space embeddings. Throughout the study, the proposed kernel-based framework is applied to biosignal and image data as an alternative to support aided diagnosis systems and image-based object analysis. Indeed, the introduced kernel-based framework improve, in most of the cases, unsupervised and supervised learning performances, aiding researchers in their quest to process and to favor the understanding of complex dataResumen: Hoy en día, el análisis de datos se ha convertido en un tema de gran interés para la comunidad científica, especialmente en campos como la automatización, el procesamiento de señales, el reconocimiento de patrones y el aprendizaje de máquina. En este sentido, la identificación, descripción, clasificación, visualización, y la agrupación de eventos o patrones son problemas importantes para desarrollos de ingeniería y cuestiones científicas, tales como: la biología, la medicina, la economía, la visión artificial, la inteligencia artificial y la producción industrial. No obstante, es difícil interpretar la información disponible debido a su complejidad y la gran cantidad de características obtenidas. Además, el análisis de los datos de entrada requiere del desarrollo de metodologías que permitan revelar los comportamientos relevantes del proceso estudiado, en particular, cuando tales señales contienen estructuras ocultas que varían sobre un dominio dado, por ejemplo, el espacio y/o el tiempo. Cuando la señal analizada contiene este tipo de propiedades, los rendimientos pueden ser inestables si se aplican directamente técnicas de procesamiento de señales y aprendizaje automático sin tener en cuenta la distribución estadística y la estructura de datos. Al respecto, las funciones núcleo (kernel) aparecen como un enfoque alternativo para abordar las limitantes antes mencionadas, proporcionando herramientas matemáticas flexibles que mejoran la representación de los datos de entrada. Por otra parte, los métodos basados en funciones núcleo son herramientas poderosas para el desarrollo de soluciones de mejor rendimiento mediante la adaptación del núcleo de acuerdo al problema en estudio. Sin embargo, la construcción de funciones núcleo apropiadas requieren del conocimiento previo por parte del usuario sobre los datos de entrada, el cual no está disponible en la mayoría de los casos prácticos. Por otra parte, a menudo la estimación de las funciones núcleo conllevan sesgos el modelo, siendo necesario apelar a simplificaciones matemáticas que no siempre son acordes con la realidad. En este estudio, se propone un marco de representación basado en métodos núcleo para resaltar relaciones relevantes entre los datos de forma automática en sistema de aprendizaje de máquina. A saber, el marco propuesto consta de cinco enfoques núcleo, en aras de adaptar la representación de acuerdo a las relaciones impuestas sobre las muestras y sobre el escenario de aprendizaje (sin/con supervisión). En primer lugar, se desarrolla un enfoque de representación núcleo que permite revelar las principales relaciones entre muestras de entrada mediante la inclusión de estructuras relevantes utilizando restricciones basadas en modelado por grafos. Por lo tanto, las estructuras de datos más sobresalientes se destacan con el objetivo de favorecer etapas posteriores de agrupamiento no supervisado. Este enfoque puede ser visto como una estrategia de depuración de grafos dentro de un marco de agrupamiento espectral que permite mejorar las consistencias locales y globales de los datos En segundo lugar, presentamos una metodología de representación núcleo que captura relaciones significativas entre muestras en términos de su distribución estadística. De este modo, se introduce una función de costo basada en aprendizaje por teoría de la información para estimar una similitud que maximice la variabilidad del potencial de información de los datos de entrada. Así, se busca un espacio de Hilbert generado por el núcleo que contenga altas fuerzas de información entre los puntos para favorecer el agrupamiento entre los mismos. En tercer lugar, se propone un esquema de representación que incluye un funcional de entropía para matrices definidas positivas a partir de la definición de Renyi. En este sentido, se pretenden incluir la distribución estadística de las muestras y sus estructuras relevantes. Por consiguiente, los patrones de entrada pertinentes se destacan en tareas de aprendizaje sin supervisión. En particular, el enfoque introducido se prueba como una herramienta para codificar las relaciones locales y globales de los datos en tareas de reducción de dimensión. En cuarto lugar, se introduce una metodología de representación núcleo supervisada a través de un aprendizaje de métrica en el espacio de Hilbert generado por una función núcleo en aras de aprovechar el conocimiento previo del usuario con respecto a la tarea de aprendizaje. Este enfoque incorpora un funcional por teoría de información que permite adaptar automáticamente la representación utilizando tanto información supervisada y la distribución estadística de los datos de entrada. Como resultado, las dependencias entre las muestras se resaltan mediante la ponderación de las características de entrada que codifican la tarea de aprendizaje supervisado. Por último, se propone una nueva medida núcleo mediante el aprovechamiento de diferentes espacios de representación. De este modo, las dependencias más relevantes entre las muestras se resaltan automáticamente considerando el dominio de interés de los datos de entrada y el conocimiento previo del usuario (información supervisada). La medida propuesta es una extensión de la función de cross-correntropia a partir de inmersiones en espacios de Hilbert. A lo largo del estudio, el esquema propuesto se valida sobre datos relacionados con bioseñales e imágenes como una alternativa para apoyar sistemas de apoyo diagnóstico y análisis objetivo basado en imágenes. De hecho, el marco introducido permite mejorar, en la mayoría de los casos, el rendimiento de sistemas de aprendizaje supervisado y no supervisado, favoreciendo la precisión de la tarea y la interpretabilidad de los datosDoctorad

    Physically inspired methods and development of data-driven predictive systems.

    Get PDF
    Traditionally building of predictive models is perceived as a combination of both science and art. Although the designer of a predictive system effectively follows a prescribed procedure, his domain knowledge as well as expertise and intuition in the field of machine learning are often irreplaceable. However, in many practical situations it is possible to build well–performing predictive systems by following a rigorous methodology and offsetting not only the lack of domain knowledge but also partial lack of expertise and intuition, by computational power. The generalised predictive model development cycle discussed in this thesis is an example of such methodology, which despite being computationally expensive, has been successfully applied to real–world problems. The proposed predictive system design cycle is a purely data–driven approach. The quality of data used to build the system is thus of crucial importance. In practice however, the data is rarely perfect. Common problems include missing values, high dimensionality or very limited amount of labelled exemplars. In order to address these issues, this work investigated and exploited inspirations coming from physics. The novel use of well–established physical models in the form of potential fields, has resulted in derivation of a comprehensive Electrostatic Field Classification Framework for supervised and semi–supervised learning from incomplete data. Although the computational power constantly becomes cheaper and more accessible, it is not infinite. Therefore efficient techniques able to exploit finite amount of predictive information content of the data and limit the computational requirements of the resource–hungry predictive system design procedure are very desirable. In designing such techniques this work once again investigated and exploited inspirations coming from physics. By using an analogy with a set of interacting particles and the resulting Information Theoretic Learning framework, the Density Preserving Sampling technique has been derived. This technique acts as a computationally efficient alternative for cross–validation, which fits well within the proposed methodology. All methods derived in this thesis have been thoroughly tested on a number of benchmark datasets. The proposed generalised predictive model design cycle has been successfully applied to two real–world environmental problems, in which a comparative study of Density Preserving Sampling and cross–validation has also been performed confirming great potential of the proposed methods
    corecore