61 research outputs found

    On Experimental Designs for Derivative Random Fields

    Get PDF
    Es werden differenzierbare zufällige Felder zweiter Ordnung untersucht und Vorschläge zur Versuchsplanung von Beobachtungen der abgeleiteten Felder unterbreitet. Von einem gewissen Standpunkt aus werden die folgenden Fragen beantwortet: Wie viele Informationen liefern Beobachtungen von Ableitungen für die Vorhersage des zugrunde liegenden Stochastischen Feldes? Wie beeinflusst eine a priori Wahl der Kovarianzfunktion das Informationsverhältnis zwischen verschiedenen abgeleiteten Feldern im Hinblick auf die Vorhersage? Als Zielfunktion wird das so genannte "imse-update" für den besten linearen Prädiktor betrachtet. Den zentralen Teil stellt die Untersuchung von Versuchsplänen mit (asymptotisch) verschwindenden Korrelationen dar. Hier wird insbesondere der Einfluss der Maternschen Klasse und J-Besselschen Klassen von Kovarianzfuntionen untersucht. Ferner wird der Einfluss gleichzeitiger Beobachtung von verschiedenen Ableitungen untersucht. Schließlich werden einige empirische Studien durchgeführt, aus denen einige praktische Ratschläge abgeleitet werden

    Sensor scheduling with time, energy and communication constraints

    Get PDF
    In this paper, we present new algorithms and analysis for the linear inverse sensor placement and scheduling problems over multiple time instances with power and communications constraints. The proposed algorithms, which deal directly with minimizing the mean squared error (MSE), are based on the convex relaxation approach to address the binary optimization scheduling problems that are formulated in sensor network scenarios. We propose to balance the energy and communications demands of operating a network of sensors over time while we still guarantee a minimum level of estimation accuracy. We measure this accuracy by the MSE for which we provide average case and lower bounds analyses that hold in general, irrespective of the scheduling algorithm used. We show experimentally how the proposed algorithms perform against state-of-the-art methods previously described in the literature

    Optimal transport representations and functional principal components for distribution-valued processes

    Full text link
    We develop statistical models for samples of distribution-valued stochastic processes through time-varying optimal transport process representations under the Wasserstein metric when the values of the process are univariate distributions. While functional data analysis provides a toolbox for the analysis of samples of real- or vector-valued processes, there is at present no coherent statistical methodology available for samples of distribution-valued processes, which are increasingly encountered in data analysis. To address the need for such methodology, we introduce a transport model for samples of distribution-valued stochastic processes that implements an intrinsic approach whereby distributions are represented by optimal transports. Substituting transports for distributions addresses the challenge of centering distribution-valued processes and leads to a useful and interpretable representation of each realized process by an overall transport and a real-valued trajectory, utilizing a scalar multiplication operation for transports. This representation facilitates a connection to Gaussian processes that proves useful, especially for the case where the distribution-valued processes are only observed on a sparse grid of time points. We study the convergence of the key components of the proposed representation to their population targets and demonstrate the practical utility of the proposed approach through simulations and application examples

    Variable selection and predictive models in Big Data environments

    Get PDF
    Mención Internacional en el título de doctorIn recent years, the advances in data collection technologies have presented a difficult challenge by extracting increasingly complex and larger datasets. Traditionally, statistics methodologies treated with datasets where the number of variables did not exceed the number of observations, however, dealing with problems where the number of variables is larger than the number of observations has become more and more common, and can be seen in areas like economics, genetics, climate data, computer vision etc. This problem has required the development of new methodologies suitable for a high dimensional framework. Most of the statistical methodologies are limited to the study of averages. Least squares regression, principal component analysis, partial least squares... All these techniques provide mean based estimations, and are built around the key idea that the data is normally distributed. But this is an assumption that is usually unverified in real datasets, where skewness and outliers can easily be found. The estimation of other metrics like the quantiles can help providing a more complete image of the data distribution. This thesis is built around these two core ideas. The development of more robust, quantile based methodologies suitable for high dimensional problems. The thesis is structured as a compendium of articles, divided into four chapters where each chapter has independent content and structure but is nevertheless encompassed within the main objective of the thesis. First, Chapter 1 introduces basic concepts and results, assumed to be known or referenced in the rest of the thesis. A possible solution when dealing with high dimensional problems in the field of regression is the usage of variable selection techniques. In this regard, sparse group lasso (SGL) has proven to be a very effective alternative. However, the mathematical formulation of this estimator introduces some bias in the model, which means that it is possible that the variables selected by the model are not the truly significant ones. Chapter 2 studies the formulation of an adaptive sparse group lasso for quantile regression, a more flexible formulation that makes use of the adaptive idea, this is, the usage of adaptive weights in the penalization to help correcting the bias, improving this way variable selection and prediction accuracy. An alternative solution to the high dimensional problem is the usage of a dimension reduction technique like partial least squares. Partial least squares (PLS) is a methodology initially proposed in the field of chemometrics as an alternative to traditional least squares regression when the data is high dimensional or faces colinearity. It works by projecting the independent data matrix into a subspace of uncorrelated variables that maximize the covariance with the response matrix. However, being an iterative process based on least squares makes this methodology extremely sensitive to the presence of outliers or heteroscedasticity. Chapter 3 defines the fast partial quantile regression, a technique that performs a projection into a subspace where a quantile covariance metric is maximized, effectively extending partial least squares to the quantile regression framework. Another field where it is common to find high dimensional data is in functional data analysis, where the observations are functions measured along time, instead of scalars. A key technique in this field is functional principal component analysis (FPCA), a methodology that provides an orthogonal set of basis functions that best explains the variability in the data. However, FPCA fails capturing shifts in the scale of the data affecting the quantiles. Chapter 4 introduces the functional quantile factor model. A methodology that extends the concept of FPCA to quantile regression, obtaining a model that can explain the quantiles of the data conditional on a set of common functions. In Chapter 5, asgl, a Python package that solves penalized least squares and quantile regression models in low and high dimensional is introduced frameworks is introduced, filling a gap in the currently available implementations of these models. Finally, Chapter 6 presents the final conclusions of this thesis, including possible lines of research and future work.En los últimos años, los avances en las tecnologías de recopilación de datos han planteado un difícil reto al extraer conjuntos de datos cada vez más complejos y de mayor tamaño. Tradicionalmente, las metodologías estadísticas trataban con conjuntos de datos en los que el número de variables no superaba el número de observaciones, sin embargo, enfrentarse a problemas en los que el número de variables es mayor que el número de observaciones se ha convertido en algo cada vez más común, y puede verse en áreas como la economía, la genética, los datos relacionados con el clima, la visión por ordenador, etc. Este problema ha exigido el desarrollo de nuevas metodologías adecuadas para un marco de alta dimensión. La mayoría de las metodologías estadísticas se limitan al estudio de la media. Regresión por mínimos cuadrados, análisis de componentes principales, mínimos cuadrados parciales... Todas estas técnicas proporcionan estimaciones basadas en la media, y están construidas en torno a la idea clave de que los datos se distribuyen normalmente. Pero esta es una suposición que no suele verificarse en los conjuntos de datos reales, en los que es fácil encontrar asimetrías y valores atípicos. La estimación de otras métricas como los cuantiles puede ayudar a proporcionar una imagen más completa de la distribución de los datos. Esta tesis se basa en estas dos ideas fundamentales. El desarrollo de metodologías más robustas, basadas en cuantiles, adecuadas para problemas de alta dimensión. La tesis está estructurada como un compendio de artículos, divididos en cuatro capítulos en los que cada uno de ellos tiene un contenido y una estructura independientes pero que, sin embargo, se engloban dentro del objetivo principal de la tesis. En primer lugar, el Capítulo 1 introduce conceptos y resultados básicos, que se suponen conocidos o a los que se hace referencia en el resto de la tesis. Una posible solución cuando se trata con problemas de alta dimensión en el campo de la regresión es el uso de técnicas de selección de variables. En este sentido, el sparse group lasso (SGL) ha demostrado ser una alternativa muy eficaz. Sin embargo, la formulación matemática de este estimador introduce cierto sesgo en el modelo, lo que significa que es posible que las variables seleccionadas por el modelo no sean las verdaderamente significativas. El Capítulo 2 estudia la formulación de un adaptive sparse group lasso para la regresión cuantílica, una formulación más flexible que hace uso de la idea adaptive, es decir, el uso de pesos adaptativos en la penalización para ayudar a corregir el sesgo, mejorando así la selección de variables y la precisión de las predicciones. Una solución alternativa al problema de la alta dimensionalidad es el uso de una técnica de reducción de dimensión como los mínimos cuadrados parciales. Los mínimos cuadrados parciales (PLS por sus siglas en inglés) es una metodología definida inicialmente en el campo de la quimiometría como una alternativa a la regresión tradicional por mínimos cuadrados cuando los datos son de alta dimensión o tienen problemas de colinearidad. Funciona proyectando la matriz de datos independiente en un subespacio de variables no correlacionadas que maximiza la covarianza con la matriz de respuesta. Sin embargo, al ser un proceso iterativo basado en mínimos cuadrados, esta metodología es extremadamente sensible a la presencia de valores atípicos o heteroscedasticidad. El Capítulo 3 define el fast partial quantile regression, una técnica que realiza una proyección en un subespacio en el que se maximiza una métrica de covarianza cuantílica, extendiendo de forma efectiva los mínimos cuadrados parciales al marco de la regresión cuantílica. Otro campo en el que es habitual encontrar datos de alta dimensión es el del análisis de datos funcionales, en el que las observaciones son funciones medidas a lo largo del tiempo, en lugar de escalares. Una técnica clave en este campo es el análisis de componentes principales funcionales (FPCA por sus siglas en inglés), una metodología que proporciona una base ortogonal de funciones que explica la mayor cantidad posible de variabilidad en los datos. Sin embargo, el FPCA no capta los cambios de escala de los datos que afectan a los cuantiles. El Capítulo 4 presenta el functional quantile factor model. Una metodología que extiende el concepto de FPCA a la regresión cuantílica, obteniendo un modelo que puede explicar los cuantiles de los datos condicionados a un conjunto de funciones comunes. En el capítulo 5 asgl, un paquete para Python que resuelve modelos de mínimos cuadrados y regresión cuantílica penalizados en entornos de baja y alta dimensión es presentado, llenando un vacío en las implementaciones actualmente disponibles de estos modelos. Por último, el Capítulo 6 presenta las conclusiones finales de esta tesis, incluyendo posibles líneas de investigación y trabajo futuro.I want to acknowledge the financial support received by research grants and projects PIPF UC3M, ECO2015-66593-P (Ministerio de Economía y Competitividad, Spain) and PID2020-113961GB-I00 (Agencia Estatal de Investigación, Spain).Programa de Doctorado en Ingeniería Matemática por la Universidad Carlos III de MadridPresidenta: María Luz Durban Reguera.- Secretaria: María Ángeles Gil Álvarez.- Vocal: Ying We

    The Meta-Model Approach for Simulation-based Design Optimization.

    Get PDF
    The design of products and processes makes increasing use of computer simulations for the prediction of its performance. These computer simulations are considerably cheaper than their physical equivalent. Finding the optimal design has therefore become a possibility. One approach for finding the optimal design using computer simulations is the meta-model approach, which approximates the behaviour of the computer simulation outcome using a limited number of time-consuming computer simulations. This thesis contains four main contributions, which are illustrated by industrial cases. First, a method is presented for the construction of an experimental design for computer simulations when the design space is restricted by many (nonlinear) constraints. The second contribution is a new approach for the approximation of the simulation outcome. This approximation method is particularly useful when the simulation model outcome reacts highly nonlinear to its inputs. Third, the meta-model based approach is extended to a robust optimization framework. Using this framework, many uncertainties can be taken into account, including uncertainty on the simulation model outcome. The fourth main contribution is the extension of the approach for use in integral design of many parts of complex systems.

    Engineering-Driven Learning Approaches for Bio-Manufacturing and Personalized Medicine

    Get PDF
    Healthcare problems have tremendous impact on human life. The past two decades have witnessed various biomedical research advances and clinical therapeutic effectiveness, including minimally invasive surgery, regenerative medicine, and immune therapy. However, the development of new treatment methods relies heavily on heuristic approaches and the experience of well-trained healthcare professionals. Therefore, it is often hindered by patient-specific genotypes and phenotypes, operator-dependent post-surgical outcomes, and exorbitant cost. Towards clinically effective and in-expensive treatments, this thesis develops analytics-based methodologies that integrate statistics, machine learning, and advanced manufacturing. Chapter 1 of my thesis introduces a novel function-on-function surrogate model with application to tissue-mimicking of 3D-printed medical prototypes. Using synthetic metamaterials to mimic biological tissue, 3D-printed medical prototypes are becoming increasingly important in improving surgery success rates. Here, the objective is to model mechanical response curves via functional metamaterial structures, and then conduct a tissue-mimicking optimization to find the best metamaterial structure. The proposed function-on-function surrogate model utilizes a Gaussian process for efficient emulation and optimization. For functional inputs, we propose a spectral-distance correlation function, which captures important spectral differences between two functional inputs. Dependencies for functional outputs are then modeled via a co-kriging framework. We further adopt shrinkage priors to learn and incorporate important physics. Finally, we demonstrate the effectiveness of the proposed emulator in a real-world study on heart surgery. Chapter 2 proposes an adaptive design method for experimentation under response censoring, often encountered in biomedical experiments. Censoring would result in a significant loss of information, and thereby a poor predictive model over an input domain. For such problems, experimental design is paramount for maximizing predictive power with a limited budget for expensive experimental runs. We propose an integrated censored mean-squared error (ICMSE) design method, which first estimates the posterior probability of a new observation being censored and then adaptively chooses design points that minimize predictive uncertainty under censoring. Adopting a Gaussian process model with product correlation functions, our ICMSE criterion has an easy-to-evaluate expression for efficient design optimization. We demonstrate the effectiveness of the ICMSE method in an application of medical device testing. Chapter 3 develops an active image synthesis method for efficient labeling (AISEL) to improve the learning performance in healthcare and medicine tasks. This is because the limited availability of data and the high costs of data collection are the key challenges when applying deep neural networks to healthcare applications. Our AISEL can generate a complementary dataset, with labels actively acquired to incorporate underlying physical knowledge at hand. AISEL framework first leverages a bidirectional generative invertible network (GIN) to extract interpretable features from training images and generate physically meaningful virtual ones. It then efficiently samples virtual images to exploit uncertain regions and explore the entire image space. We demonstrate the effectiveness of AISEL on a heart surgery study, where it lowers the labeling cost by 90% while achieving a 15% improvement in prediction accuracy. Chapter 4 presents a calibration-free statistical framework for the promising chimeric antigen receptor T cell therapy in fighting cancers. The objective is to effectively recover critical quality attributes under the intrinsic patient-to-patient variability, and therefore lower the cost of cell therapy. Our calibration-free approach models the patient-to-patient variability via a patient-specific calibration parameter. We adopt multiple biosensors to construct a patient-invariance statistic and alleviate the effect of the calibration parameter. Using the patient-invariance statistic, we can then recover the critical quality attribute during cell culture, free from the calibration parameter. In a T cell therapy study, our method effectively recovers viable cell concentration for cell culture monitoring and scale-up.Ph.D

    Bayesian quadrature, energy minimization, and space-filling design

    Get PDF
    A standard objective in computer experiments is to approximate the behavior of an unknown function on a compact domain from a few evaluations inside the domain. When little is known about the function, space-filling design is advisable: typically, points of evaluation spread out across the available space are obtained by minimizing a geometrical (for instance, covering radius) or a discrepancy criterion measuring distance to uniformity. The paper investigates connections between design for integration (quadrature design), construction of the (continuous) best linear unbiased estimator (BLUE) for the location model, space-filling design, and minimization of energy (kernel discrepancy) for signed measures. Integrally strictly positive definite kernels define strictly convex energy functionals, with an equivalence between the notions of potential and directional derivative, showing the strong relation between discrepancy minimization and more traditional design of optimal experiments. In particular, kernel herding algorithms, which are special instances of vertex-direction methods used in optimal design, can be applied to the construction of point sequences with suitable space-filling properties
    corecore