973 research outputs found
Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation
Achilles Tendon Rupture (ATR) is one of the typical soft tissue injuries.
Rehabilitation after such a musculoskeletal injury remains a prolonged process
with a very variable outcome. Accurately predicting rehabilitation outcome is
crucial for treatment decision support. However, it is challenging to train an
automatic method for predicting the ATR rehabilitation outcome from treatment
data, due to a massive amount of missing entries in the data recorded from ATR
patients, as well as complex nonlinear relations between measurements and
outcomes. In this work, we design an end-to-end probabilistic framework to
impute missing data entries and predict rehabilitation outcomes simultaneously.
We evaluate our model on a real-life ATR clinical cohort, comparing with
various baselines. The proposed method demonstrates its clear superiority over
traditional methods which typically perform imputation and prediction in two
separate stages
Scalable Low-Rank Tensor Learning for Spatiotemporal Traffic Data Imputation
Missing value problem in spatiotemporal traffic data has long been a
challenging topic, in particular for large-scale and high-dimensional data with
complex missing mechanisms and diverse degrees of missingness. Recent studies
based on tensor nuclear norm have demonstrated the superiority of tensor
learning in imputation tasks by effectively characterizing the complex
correlations/dependencies in spatiotemporal data. However, despite the
promising results, these approaches do not scale well to large data tensors. In
this paper, we focus on addressing the missing data imputation problem for
large-scale spatiotemporal traffic data. To achieve both high accuracy and
efficiency, we develop a scalable tensor learning model -- Low-Tubal-Rank
Smoothing Tensor Completion (LSTC-Tubal) -- based on the existing framework of
Low-Rank Tensor Completion, which is well-suited for spatiotemporal traffic
data that is characterized by multidimensional structure of location
time of day day. In particular, the proposed LSTC-Tubal model involves
a scalable tensor nuclear norm minimization scheme by integrating linear
unitary transformation. Therefore, tensor nuclear norm minimization can be
solved by singular value thresholding on the transformed matrix of each day
while the day-to-day correlation can be effectively preserved by the unitary
transform matrix. We compare LSTC-Tubal with state-of-the-art baseline models,
and find that LSTC-Tubal can achieve competitive accuracy with a significantly
lower computational cost. In addition, the LSTC-Tubal will also benefit other
tasks in modeling large-scale spatiotemporal traffic data, such as
network-level traffic forecasting
Data analysis and machine learning approaches for time series pre- and post- processing pipelines
157 p.En el ámbito industrial, las series temporales suelen generarse de forma continua mediante sensores quecaptan y supervisan constantemente el funcionamiento de las máquinas en tiempo real. Por ello, esimportante que los algoritmos de limpieza admitan un funcionamiento casi en tiempo real. Además, amedida que los datos evolución, la estrategia de limpieza debe cambiar de forma adaptativa eincremental, para evitar tener que empezar el proceso de limpieza desde cero cada vez.El objetivo de esta tesis es comprobar la posibilidad de aplicar flujos de aprendizaje automática a lasetapas de preprocesamiento de datos. Para ello, este trabajo propone métodos capaces de seleccionarestrategias óptimas de preprocesamiento que se entrenan utilizando los datos históricos disponibles,minimizando las funciones de perdida empíricas.En concreto, esta tesis estudia los procesos de compresión de series temporales, unión de variables,imputación de observaciones y generación de modelos subrogados. En cada uno de ellos se persigue laselección y combinación óptima de múltiples estrategias. Este enfoque se define en función de lascaracterísticas de los datos y de las propiedades y limitaciones del sistema definidas por el usuario
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Topics in sparse functional data analysis
This dissertation consists of three research papers that address different problems in modeling sparse functional data. The first paper (Chapter 2) focuses on the statistical inference for Analysis of Covariance (ANCOVA) models on sparse functional data. In an analysis of covariance model for sparse functional data, the treatment effects, after adjusting for the effects of subject specific covariates, are represented by functions of time. We apply the seemingly unrelated kernel estimator, which takes the within subject correlation into account, to estimate the nonparametric components of the model, and test treatment effects using a generalized quasi-likelihood ratio test. We derived the asymptotic distribution of the test statistics under both the null and some local alternative hypothesis, and show that the proposed test enjoys the Wilks property and is minimax most powerful when the within-subject correlation structure is correctly specified. The second paper (Chapter 3) develops an algorithm to impute missing values in spatiotemporal satellite images based on sparse functional data analysis methods. We model the satellite images as functional data which is both sparse in temporal domain and spatial domain and assume they are repeated measurements of a latent spatiotemporal process. We assume the latent spatiotemporal process is composed of fixed mean function, random temporal effect and random spatial effect. We propose an algorithm to estimate each component using functional principle component analysis (FPCA) techniques.
The proposed imputation algorithm is validated on real data and shows great performance in all
aspects compared with its competitors. The third paper (Chapter 4) proposes a hierarchical multiresolution
imputation (HMRI) algorithm for imputation of high-resolution spatiotemporal satellite
images, which is an extension of the second paper. HMRI is demonstrated by using the Moderate
Resolution Imaging Spectrophotometer (MODIS) daily land surfact temperature (LST) data and
shows satisfactory imputation results. HMRI shows large improvement in prediction accuracy
compared with other existing methods
- …