15,741 research outputs found
Support vector machine for functional data classification
In many applications, input data are sampled functions taking their values in
infinite dimensional spaces rather than standard vectors. This fact has complex
consequences on data analysis algorithms that motivate modifications of them.
In fact most of the traditional data analysis tools for regression,
classification and clustering have been adapted to functional inputs under the
general name of functional Data Analysis (FDA). In this paper, we investigate
the use of Support Vector Machines (SVMs) for functional data analysis and we
focus on the problem of curves discrimination. SVMs are large margin classifier
tools based on implicit non linear mappings of the considered data into high
dimensional spaces thanks to kernels. We show how to define simple kernels that
take into account the unctional nature of the data and lead to consistent
classification. Experiments conducted on real world data emphasize the benefit
of taking into account some functional aspects of the problems.Comment: 13 page
B-Spline Finite Elements and their Efficiency in Solving Relativistic Mean Field Equations
A finite element method using B-splines is presented and compared with a
conventional finite element method of Lagrangian type. The efficiency of both
methods has been investigated at the example of a coupled non-linear system of
Dirac eigenvalue equations and inhomogeneous Klein-Gordon equations which
describe a nuclear system in the framework of relativistic mean field theory.
Although, FEM has been applied with great success in nuclear RMF recently, a
well known problem is the appearance of spurious solutions in the spectra of
the Dirac equation. The question, whether B-splines lead to a reduction of
spurious solutions is analyzed. Numerical expenses, precision and behavior of
convergence are compared for both methods in view of their use in large scale
computation on FEM grids with more dimensions. A B-spline version of the object
oriented C++ code for spherical nuclei has been used for this investigation.Comment: 27 pages, 30 figure
Fast Selection of Spectral Variables with B-Spline Compression
The large number of spectral variables in most data sets encountered in
spectral chemometrics often renders the prediction of a dependent variable
uneasy. The number of variables hopefully can be reduced, by using either
projection techniques or selection methods; the latter allow for the
interpretation of the selected variables. Since the optimal approach of testing
all possible subsets of variables with the prediction model is intractable, an
incremental selection approach using a nonparametric statistics is a good
option, as it avoids the computationally intensive use of the model itself. It
has two drawbacks however: the number of groups of variables to test is still
huge, and colinearities can make the results unstable. To overcome these
limitations, this paper presents a method to select groups of spectral
variables. It consists in a forward-backward procedure applied to the
coefficients of a B-Spline representation of the spectra. The criterion used in
the forward-backward procedure is the mutual information, allowing to find
nonlinear dependencies between variables, on the contrary of the generally used
correlation. The spline representation is used to get interpretability of the
results, as groups of consecutive spectral variables will be selected. The
experiments conducted on NIR spectra from fescue grass and diesel fuels show
that the method provides clearly identified groups of selected variables,
making interpretation easy, while keeping a low computational load. The
prediction performances obtained using the selected coefficients are higher
than those obtained by the same method applied directly to the original
variables and similar to those obtained using traditional models, although
using significantly less spectral variables
Streaming visualisation of quantitative mass spectrometry data based on a novel raw signal decomposition method
As data rates rise, there is a danger that informatics for high-throughput LC-MS becomes more opaque and inaccessible to practitioners. It is therefore critical that efficient visualisation tools are available to facilitate quality control, verification, validation, interpretation, and sharing of raw MS data and the results of MS analyses. Currently, MS data is stored as contiguous spectra. Recall of individual spectra is quick but panoramas, zooming and panning across whole datasets necessitates processing/memory overheads impractical for interactive use. Moreover, visualisation is challenging if significant quantification data is missing due to data-dependent acquisition of MS/MS spectra. In order to tackle these issues, we leverage our seaMass technique for novel signal decomposition. LC-MS data is modelled as a 2D surface through selection of a sparse set of weighted B-spline basis functions from an over-complete dictionary. By ordering and spatially partitioning the weights with an R-tree data model, efficient streaming visualisations are achieved. In this paper, we describe the core MS1 visualisation engine and overlay of MS/MS annotations. This enables the mass spectrometrist to quickly inspect whole runs for ionisation/chromatographic issues, MS/MS precursors for coverage problems, or putative biomarkers for interferences, for example. The open-source software is available from http://seamass.net/viz/
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Representation of Functional Data in Neural Networks
Functional Data Analysis (FDA) is an extension of traditional data analysis
to functional data, for example spectra, temporal series, spatio-temporal
images, gesture recognition data, etc. Functional data are rarely known in
practice; usually a regular or irregular sampling is known. For this reason,
some processing is needed in order to benefit from the smooth character of
functional data in the analysis methods. This paper shows how to extend the
Radial-Basis Function Networks (RBFN) and Multi-Layer Perceptron (MLP) models
to functional data inputs, in particular when the latter are known through
lists of input-output pairs. Various possibilities for functional processing
are discussed, including the projection on smooth bases, Functional Principal
Component Analysis, functional centering and reduction, and the use of
differential operators. It is shown how to incorporate these functional
processing into the RBFN and MLP models. The functional approach is illustrated
on a benchmark of spectrometric data analysis.Comment: Also available online from:
http://www.sciencedirect.com/science/journal/0925231
Aggregated functional data model for Near-Infrared Spectroscopy calibration and prediction
Calibration and prediction for NIR spectroscopy data are performed based on a
functional interpretation of the Beer-Lambert formula. Considering that, for
each chemical sample, the resulting spectrum is a continuous curve obtained as
the summation of overlapped absorption spectra from each analyte plus a
Gaussian error, we assume that each individual spectrum can be expanded as a
linear combination of B-splines basis. Calibration is then performed using two
procedures for estimating the individual analytes curves: basis smoothing and
smoothing splines. Prediction is done by minimizing the square error of
prediction. To assess the variance of the predicted values, we use a
leave-one-out jackknife technique. Departures from the standard error models
are discussed through a simulation study, in particular, how correlated errors
impact on the calibration step and consequently on the analytes' concentration
prediction. Finally, the performance of our methodology is demonstrated through
the analysis of two publicly available datasets.Comment: 27 pages, 7 figures, 7 table
Conditional Spectral Analysis of Replicated Multiple Time Series with Application to Nocturnal Physiology
This article considers the problem of analyzing associations between power
spectra of multiple time series and cross-sectional outcomes when data are
observed from multiple subjects. The motivating application comes from sleep
medicine, where researchers are able to non-invasively record physiological
time series signals during sleep. The frequency patterns of these signals,
which can be quantified through the power spectrum, contain interpretable
information about biological processes. An important problem in sleep research
is drawing connections between power spectra of time series signals and
clinical characteristics; these connections are key to understanding biological
pathways through which sleep affects, and can be treated to improve, health.
Such analyses are challenging as they must overcome the complicated structure
of a power spectrum from multiple time series as a complex positive-definite
matrix-valued function. This article proposes a new approach to such analyses
based on a tensor-product spline model of Cholesky components of
outcome-dependent power spectra. The approach flexibly models power spectra as
nonparametric functions of frequency and outcome while preserving geometric
constraints. Formulated in a fully Bayesian framework, a Whittle likelihood
based Markov chain Monte Carlo (MCMC) algorithm is developed for automated
model fitting and for conducting inference on associations between outcomes and
spectral measures. The method is used to analyze data from a study of sleep in
older adults and uncovers new insights into how stress and arousal are
connected to the amount of time one spends in bed
- …