6,821 research outputs found
Functional Factorial K-means Analysis
A new procedure for simultaneously finding the optimal cluster structure of
multivariate functional objects and finding the subspace to represent the
cluster structure is presented. The method is based on the -means criterion
for projected functional objects on a subspace in which a cluster structure
exists. An efficient alternating least-squares algorithm is described, and the
proposed method is extended to a regularized method for smoothness of weight
functions. To deal with the negative effect of the correlation of coefficient
matrix of the basis function expansion in the proposed algorithm, a two-step
approach to the proposed method is also described. Analyses of artificial and
real data demonstrate that the proposed method gives correct and interpretable
results compared with existing methods, the functional principal component
-means (FPCK) method and tandem clustering approach. It is also shown that
the proposed method can be considered complementary to FPCK.Comment: 39 pages, 17 figure
Development and Application of Chemometric Methods for Modelling Metabolic Spectral Profiles
The interpretation of metabolic information is crucial to understanding the functioning of a biological
system. Latent information about the metabolic state of a sample can be acquired using
analytical chemistry methods, which generate spectroscopic profiles. Thus, nuclear magnetic resonance
spectroscopy and mass spectrometry techniques can be employed to generate vast amounts
of highly complex data on the metabolic content of biofluids and tissue, and this thesis discusses
ways to process, analyse and interpret these data successfully.
The evaluation of J -resolved spectroscopy in magnetic resonance profiling and the statistical
techniques required to extract maximum information from the projections of these spectra are
studied. In particular, data processing is evaluated, and correlation and regression methods are
investigated with respect to enhanced model interpretation and biomarker identification. Additionally,
it is shown that non-linearities in metabonomic data can be effectively modelled with
kernel-based orthogonal partial least squares, for which an automated optimisation of the kernel
parameter with nested cross-validation is implemented. The interpretation of orthogonal variation
and predictive ability enabled by this approach are demonstrated in regression and classification
models for applications in toxicology and parasitology. Finally, the vast amount of data generated
with mass spectrometry imaging is investigated in terms of data processing, and the benefits of
applying multivariate techniques to these data are illustrated, especially in terms of interpretation
and visualisation using colour-coding of images. The advantages of methods such as principal
component analysis, self-organising maps and manifold learning over univariate analysis are highlighted.
This body of work therefore demonstrates new means of increasing the amount of biochemical
information that can be obtained from a given set of samples in biological applications using
spectral profiling. Various analytical and statistical methods are investigated and illustrated with
applications drawn from diverse biomedical areas
Robust classification via MOM minimization
We present an extension of Vapnik's classical empirical risk minimizer (ERM)
where the empirical risk is replaced by a median-of-means (MOM) estimator, the
new estimators are called MOM minimizers. While ERM is sensitive to corruption
of the dataset for many classical loss functions used in classification, we
show that MOM minimizers behave well in theory, in the sense that it achieves
Vapnik's (slow) rates of convergence under weak assumptions: data are only
required to have a finite second moment and some outliers may also have
corrupted the dataset.
We propose an algorithm inspired by MOM minimizers. These algorithms can be
analyzed using arguments quite similar to those used for Stochastic Block
Gradient descent. As a proof of concept, we show how to modify a proof of
consistency for a descent algorithm to prove consistency of its MOM version. As
MOM algorithms perform a smart subsampling, our procedure can also help to
reduce substantially time computations and memory ressources when applied to
non linear algorithms.
These empirical performances are illustrated on both simulated and real
datasets
- …