604 research outputs found

    Seamless Multimodal Biometrics for Continuous Personalised Wellbeing Monitoring

    Full text link
    Artificially intelligent perception is increasingly present in the lives of every one of us. Vehicles are no exception, (...) In the near future, pattern recognition will have an even stronger role in vehicles, as self-driving cars will require automated ways to understand what is happening around (and within) them and act accordingly. (...) This doctoral work focused on advancing in-vehicle sensing through the research of novel computer vision and pattern recognition methodologies for both biometrics and wellbeing monitoring. The main focus has been on electrocardiogram (ECG) biometrics, a trait well-known for its potential for seamless driver monitoring. Major efforts were devoted to achieving improved performance in identification and identity verification in off-the-person scenarios, well-known for increased noise and variability. Here, end-to-end deep learning ECG biometric solutions were proposed and important topics were addressed such as cross-database and long-term performance, waveform relevance through explainability, and interlead conversion. Face biometrics, a natural complement to the ECG in seamless unconstrained scenarios, was also studied in this work. The open challenges of masked face recognition and interpretability in biometrics were tackled in an effort to evolve towards algorithms that are more transparent, trustworthy, and robust to significant occlusions. Within the topic of wellbeing monitoring, improved solutions to multimodal emotion recognition in groups of people and activity/violence recognition in in-vehicle scenarios were proposed. At last, we also proposed a novel way to learn template security within end-to-end models, dismissing additional separate encryption processes, and a self-supervised learning approach tailored to sequential data, in order to ensure data security and optimal performance. (...)Comment: Doctoral thesis presented and approved on the 21st of December 2022 to the University of Port

    Generalised latent variable models for location, scale, and shape parameters

    Get PDF
    Latent Variable Models (LVM) are widely used in social, behavioural, and educational sciences to uncover underlying associations in multivariate data using a smaller number of latent variables. However, the classical LVM framework has certain assumptions that can be restrictive in empirical applications. In particular, the distribution of the observed variables being from the exponential family and the latent variables influencing only the conditional mean of the observed variables. This thesis addresses these limitations and contributes to the current literature in two ways. First, we propose a novel class of models called Generalised Latent Variable Models for Location, Scale, and Shape parameters (GLVM-LSS). These models use linear functions of latent factors to model location, scale, and shape parameters of the items’ conditional distributions. By doing so, we model higher order moments such as variance, skewness, and kurtosis in terms of the latent variables, providing a more flexible framework compared to classical factor models. The model parameters are estimated using maximum likelihood estimation. Second, we address the challenge of interpreting the GLVM-LSS, which can be complex due to its increased number of parameters. We propose a penalised maximum likelihood estimation approach with automatic selection of tuning parameters. This extends previous work on penalised estimation in the LVM literature to cases without closed-form solutions. Our findings suggest that modelling the entire distribution of items, not just the conditional mean, leads to improved model fit and deeper insights into how the items reflect the latent constructs they are intended to measure. To assess the performance of the proposed methods, we conduct extensive simulation studies and apply it to real-world data from educational testing and public opinion research. The results highlight the efficacy of the GLVM-LSS framework in capturing complex relationships between observed variables and latent factors, providing valuable insights for researchers in various fields

    Evaluating 3 decades of precipitation in the Upper Colorado River basin from a high-resolution regional climate model

    Get PDF
    Convection-permitting regional climate models (RCMs) have recently become tractable for applications at multi-decadal timescales. These types of models have tremendous utility for water resource studies, but better characterization of precipitation biases is needed, particularly for water-resource-critical mountain regions, where precipitation is highly variable in space, observations are sparse, and the societal water need is great. This study examines 34 years (1987–2020) of RCM precipitation from the Weather Research and Forecasting model (WRF; v3.8.1), using the Climate Forecast System Reanalysis (CFS; CFSv2) initial and lateral boundary conditions and a 1 km × 1 km innermost grid spacing. The RCM is centered over the Upper Colorado River basin, with a focus on the high-elevation, 750 km2 East River watershed (ERW), where a variety of high-impact scientific activities are currently ongoing. Precipitation is compared against point observations (Natural Resources Conservation Service Snow Telemetry or SNOTEL), gridded climate datasets (Newman, Livneh, and PRISM), and Bayesian reconstructions of watershed mean precipitation conditioned on streamflow and high-resolution snow remote-sensing products. We find that the cool-season precipitation percent error between WRF and 23 SNOTEL gauges has a low overall bias (x^ = 0.25 %, s = 13.63 %) and that WRF has a higher percent error during the warm season (x^ = 10.37 %, s = 12.79 %). Warm-season bias manifests as a high number of low-precipitation days, though the low-resolution or SNOTEL gauges limit some of the conclusions that can be drawn. Regional comparisons between WRF precipitation accumulation and three different gridded datasets show differences on the order of ± 20 %, particularly at the highest elevations and in keeping with findings from other studies. We find that WRF agrees slightly better with the Bayesian reconstruction of precipitation in the ERW compared to the gridded precipitation datasets, particularly when changing SNOTEL densities are taken into account. The conclusions are that the RCM reasonably captures orographic precipitation in this region and demonstrates that leveraging additional hydrologic information (streamflow and snow remote-sensing data) improves the ability to characterize biases in RCM precipitation fields. Error characteristics reported in this study are essential for leveraging the RCM model outputs for studies of past and future climates and water resource applications. The methods developed in this study can be applied to other watersheds and model configurations. Hourly 1 km × 1 km precipitation and other meteorological outputs from this dataset are publicly available and suitable for a wide variety of applications.</p

    Haplotype blocks for genomic prediction: a comparative evaluation in multiple crop datasets

    Get PDF
    In modern plant breeding, genomic selection is becoming the gold standard for selection of superior genotypes. The basis for genomic prediction models is a set of phenotyped lines along with their genotypic profile. With high marker density and linkage disequilibrium (LD) between markers, genotype data in breeding populations tends to exhibit considerable redundancy. Therefore, interest is growing in the use of haplotype blocks to overcome redundancy by summarizing co-inherited features. Moreover, haplotype blocks can help to capture local epistasis caused by interacting loci. Here, we compared genomic prediction methods that either used single SNPs or haplotype blocks with regards to their prediction accuracy for important traits in crop datasets. We used four published datasets from canola, maize, wheat and soybean. Different approaches to construct haplotype blocks were compared, including blocks based on LD, physical distance, number of adjacent markers and the algorithms implemented in the software “Haploview” and “HaploBlocker”. The tested prediction methods included Genomic Best Linear Unbiased Prediction (GBLUP), Extended GBLUP to account for additive by additive epistasis (EGBLUP), Bayesian LASSO and Reproducing Kernel Hilbert Space (RKHS) regression. We found improved prediction accuracy in some traits when using haplotype blocks compared to SNP-based predictions, however the magnitude of improvement was very trait- and model-specific. Especially in settings with low marker density, haplotype blocks can improve genomic prediction accuracy. In most cases, physically large haplotype blocks yielded a strong decrease in prediction accuracy. Especially when prediction accuracy varies greatly across different prediction models, prediction based on haplotype blocks can improve prediction accuracy of underperforming models. However, there is no “best” method to build haplotype blocks, since prediction accuracy varied considerably across methods and traits. Hence, criteria used to define haplotype blocks should not be viewed as fixed biological parameters, but rather as hyperparameters that need to be adjusted for every dataset

    Regression-based projection for learning Mori-Zwanzig operators

    Full text link
    We propose to adopt statistical regression as the projection operator to enable data-driven learning of the operators in the Mori--Zwanzig formalism. We present a principled method to extract the Markov and memory operators for any regression models. We show that the choice of linear regression results in a recently proposed data-driven learning algorithm based on Mori's projection operator, which is a higher-order approximate Koopman learning method. We show that more expressive nonlinear regression models naturally fill in the gap between the highly idealized and computationally efficient Mori's projection operator and the most optimal yet computationally infeasible Zwanzig's projection operator. We performed numerical experiments and extracted the operators for an array of regression-based projections, including linear, polynomial, spline, and neural-network-based regressions, showing a progressive improvement as the complexity of the regression model increased. Our proposition provides a general framework to extract memory-dependent corrections and can be readily applied to an array of data-driven learning methods for stationary dynamical systems in the literature.Comment: 41 pages, 12 figures; major revision of V

    D-Vine GAM Copula based Quantile Regression with Application to Ensemble Postprocessing

    Full text link
    Temporal, spatial or spatio-temporal probabilistic models are frequently used for weather forecasting. The D-vine (drawable vine) copula quantile regression (DVQR) is a powerful tool for this application field, as it can automatically select important predictor variables from a large set and is able to model complex nonlinear relationships among them. However, the current DVQR does not always explicitly and economically allow to account for additional covariate effects, e.g. temporal or spatio-temporal information. Consequently, we propose an extension of the current DVQR, where we parametrize the bivariate copulas in the D-vine copula through Kendall's Tau which can be linked to additional covariates. The parametrization of the correlation parameter allows generalized additive models (GAMs) and spline smoothing to detect potentially hidden covariate effects. The new method is called GAM-DVQR, and its performance is illustrated in a case study for the postprocessing of 2m surface temperature forecasts. We investigate a constant as well as a time-dependent Kendall's Tau. The GAM-DVQR models are compared to the benchmark methods Ensemble Model Output Statistics (EMOS), its gradient-boosted extension (EMOS-GB) and basic DVQR. The results indicate that the GAM-DVQR models are able to identify time-dependent correlations as well as relevant predictor variables and significantly outperform the state-of-the-art methods EMOS and EMOS-GB. Furthermore, the introduced parameterization allows using a static training period for GAM-DVQR, yielding a more sustainable model estimation in comparison to DVQR using a sliding training window. Finally, we give an outlook of further applications and extensions of the GAM-DVQR model. To complement this article, our method is accompanied by an R-package called gamvinereg

    Enhancing Missing Data Imputation of Non-stationary Signals with Harmonic Decomposition

    Full text link
    Dealing with time series with missing values, including those afflicted by low quality or over-saturation, presents a significant signal processing challenge. The task of recovering these missing values, known as imputation, has led to the development of several algorithms. However, we have observed that the efficacy of these algorithms tends to diminish when the time series exhibit non-stationary oscillatory behavior. In this paper, we introduce a novel algorithm, coined Harmonic Level Interpolation (HaLI), which enhances the performance of existing imputation algorithms for oscillatory time series. After running any chosen imputation algorithm, HaLI leverages the harmonic decomposition based on the adaptive nonharmonic model of the initial imputation to improve the imputation accuracy for oscillatory time series. Experimental assessments conducted on synthetic and real signals consistently highlight that HaLI enhances the performance of existing imputation algorithms. The algorithm is made publicly available as a readily employable Matlab code for other researchers to use

    Statistical Machine Learning Methodology for Individualized Treatment Rule Estimation in Precision Medicine

    Get PDF
    Precision medicine aims to deliver optimal, individualized treatments for patients by accounting for their unique characteristics. With a foundation in reinforcement learning, decision theory, and causal inference, the field of precision medicine has seen many advancements in recent years. Significant focus has been placed on creating algorithms to estimate individualized treatment rules (ITRs), which map from patient covariates to the space of available treatments with the goal of maximizing patient outcome. In Chapter 1, we extend ITR estimation methodology in the scenario where variance of the outcome is heterogeneous with respect to treatment and covariates. Accordingly, we propose Stabilized Direct Learning (SD-Learning), which utilizes heteroscedasticity in the error term through a residual reweighting framework that models residual variance via flexible machine learning algorithms such as XGBoost and random forests. We also develop an internal cross-validation scheme which determines the best residual model among competing models. Further, we extend this methodology to multi-arm treatment scenarios. In Chapter 2, we develop ITR estimation methodology for situations where clinical decision-making involves balancing multiple outcomes of interest. Our proposed framework estimates an ITR which maximizes a combination of the multiple clinical outcomes, accounting for the fact that patients may ascribe importance to outcomes differently (utility heterogeneity). This approach employs inverse reinforcement learning (IRL) techniques through an expert-augmentation solution, whereby physicians provide input to guide the utility estimation and ITR learning processes. In Chapter 3, we apply an end-to-end precision medicine workflow to novel data from older adults with Type 1 Diabetes in order to understand the heterogeneous treatment effects of continuous glucose monitoring (CGM) and develop an interpretable ITR to reveal patients for which CGM confers a major safety benefit. The results from this analysis elucidate the demographic and clinical markers which moderate CGM's success, provide the basis for using diagnostic CGM to inform therapeutic CGM decisions, and serve to augment clinical decision-making. Finally, in Chapter 4, as a future research direction, we propose a deep autoencoder framework which simultaneously performs feature selection and ITR optimization, contributing to methodology built for direct consumption of unstructured, high-dimensional data in the precision medicine pipeline.Doctor of Philosoph

    Explotación sinérgica de datos multiespectrales y radar para la estimación de variables biofísicas de la vegetación mediante tecnologías de sensoramiento remoto

    Get PDF
    Las variables biofísicas de la vegetación (VBV) son indicadores directos del crecimiento y productividad de los cultivos. Los sistemas de observación de la Tierra (EO–Earth observation) presentan oportunidades sin precedentes para el monitoreo de las variables biofísicas del trigo. Sentinel–2 (S2) es una constelación de satélites que forma parte de las misiones Sentinel del programa Copernicus de EO. El período de revisita, así como su resolución espacial y espectral, han convertido a S2 en un sistema de EO trascendental para el monitoreo de VBV. Los sistemas ópticos de EO se ven limitados con frecuencia por las condiciones climáticas tales como nubosidad o precipitaciones. En este sentido, la tecnología radar, presenta nuevas oportunidades para el monitoreo de VBV que deben explorarse en profundidad. Sentinel–1 (S1) es una constelación radar de la familia Sentinel. Debido a la complejidad de la interacción de la señal radar con las superficies cultivadas y al ruido aditivo inherente de speckle, la estimación de VBV con tecnología radar aún sigue siendo un desafío. El objetivo de esta tesis doctoral es desarrollar modelos de estimación de variables biofísicas del trigo, en una zona irrigada de cultivo intensivo al sureste de Argentina, basados en medidas in situ de la vegetación, a partir de: i) datos multiespectrales de S2; ii) datos radar de S1; y iii) la sinergia S1 & S2. Para abordar la problemática planteada, se desarrollaron en primer lugar, modelos de estimación del índice de área foliar, del contenido de clorofila de la cubierta vegetal y del contenido de agua del trigo, utilizando una base de datos multitemporal de VBV tomadas in situ, algoritmos de aprendizaje automático, una base de datos de espectros de reflectividad bidireccional de la vegetación simulados con un modelo de transferencia radiativa y datos multiespectrales de S2. Se obtuvieron modelos híbridos de estimación de estas VBV que se ajustaron con alta precisión a los datos de campo y se logró reconstruir con éxito la curva fenológica del cultivo de trigo. En segundo lugar, se implementó un modelo de estimación de LAI basado en datos radar de S1 adquiridos en diferentes geometrías de adquisición. Se probó que la estructura tridimensional de la vegetación cuando es observada desde ángulos de incidencia local diferentes proporciona información muy valiosa que puede ser utilizada para mejorar los modelos existentes. Por último, se desarrolló una estrategia de fusión de datos de S1 & S2 para reconstruir series temporales de VWC. Se aplicaron varios modelos de procesos Gaussianos de salidas múltiples para analizar la correlación cruzada existente, en el dominio de la frecuencia, entre los canales ópticos y radar. La combinación sinérgica de datos radar y ópticos mostró ser un novedoso enfoque para abordar el monitoreo de variables biofísicas del trigo en regiones intensamente cultivadas con frecuente nubosidad

    FineMorphs: Affine-diffeomorphic sequences for regression

    Full text link
    A multivariate regression model of affine and diffeomorphic transformation sequences - FineMorphs - is presented. Leveraging concepts from shape analysis, model states are optimally "reshaped" by diffeomorphisms generated by smooth vector fields during learning. Affine transformations and vector fields are optimized within an optimal control setting, and the model can naturally reduce (or increase) dimensionality and adapt to large datasets via suboptimal vector fields. An existence proof of solution and necessary conditions for optimality for the model are derived. Experimental results on real datasets from the UCI repository are presented, with favorable results in comparison with state-of-the-art in the literature and densely-connected neural networks in TensorFlow.Comment: 39 pages, 7 figure
    corecore