59 research outputs found
A new formula for fast computation of segmented cross validation residuals in linear regression modelling -- providing efficient regularisation parameter estimation in Ridge Regression and the Tikhonov Regularisation Framework
In the present paper we prove a new theorem, resulting in an exact updating
formula for linear regression model residuals to calculate the segmented
cross-validation residuals for any choice of cross-validation strategy without
model refitting. The required matrix inversions are limited by the
cross-validation segment sizes and can be executed with high efficiency in
parallel. The well-known formula for leave-one-out cross-validation follows as
a special case of our theorem. In situations where the cross-validation
segments consist of small groups of repeated measurements, we suggest a
heuristic strategy for fast serial approximations of the cross-validated
residuals and associated PRESS statistic. We also suggest strategies for quick
estimation of the exact minimum PRESS value and full PRESS function over a
selected interval of regularisation values. The computational effectiveness of
the parameter selection for Ridge-/Tikhonov regression modelling resulting from
our theoretical findings and heuristic arguments is demonstrated for several
practical applications.Comment: 33 pages, 10 figure, 8 table
Orders of magnitude speed increase in Partial Least Squares feature selection with new simple indexing technique for very tall data sets
Feature selection is a challenging combinatorial optimization problem that
tends to require a large number of candidate feature subsets to be evaluated
before a satisfying solution is obtained. Because of the computational cost associated with estimating the regression coefficients for each subset, feature selection
can be an immensely time-consuming process and is often left inadequately
explored. Here, we propose a simple modification to the conventional sequence
of calculations involved when fitting a number of feature subsets to the same
response data with partial least squares (PLS) model fitting. The modification
consists in establishing the covariance matrix for the full set of features by an
initial calculation and then deriving the covariance of all subsequent feature
subsets solely by indexing into the original covariance matrix. By choosing this
approach, which is primarily suitable for tall design matrices with significantly
more rows than columns, we avoid redundant (identical) recalculations in the
evaluation of different feature subsets. By benchmarking the time required to
solve regression problems of various sizes, we demonstrate that the introduced
technique outperforms traditional approaches by several orders of magnitude
when used in conjunction with PLS modeling. In the supplementary material,
we provide code for implementing the concept with kernel PLS regression.acceptedVersio
Selection of principal variables through a modified Gram–Schmidt process with and without supervision
In various situations requiring empirical model building from highly multivariate measurements, modelling based on partial least squares regression (PLSR) may often provide efficient low-dimensional model solutions. In unsupervised situations, the same may be true for principal component analysis (PCA). In both cases, however, it is also of interest to identify subsets of the measured variables useful for obtaining sparser but still comparable models without significant loss of information and performance. In the present paper, we propose a voting approach for sparse overall maximisation of variance analogous to PCA and a similar alternative for deriving sparse regression models influenced closely related to the PLSR method. Both cases yield pivoting strategies for a modified Gram–Schmidt process and its corresponding (partial) QRfactorisation of the underlying data matrix to manage the variable selection process. The proposed methods include score and loading plot possibilities that are acknowledged for providing efficient interpretations of the related PCA and PLS models in chemometric applications.Selection of principal variables through a modified Gram–Schmidt process with and without supervisionpublishedVersio
The Response of Enterococcus faecalis V583 to Chloramphenicol Treatment
Many Enterococcus faecalis strains display tolerance or resistance to many antibiotics, but genes that contribute to the resistance cannot be specified. The multiresistant E. faecalis V583, for which the complete genome sequence is available, survives and grows in media containing relatively high levels of chloramphenicol. No specific genes coding for chloramphenicol resistance has been recognized in V583. We used microarrays to identify genes and mechanisms behind the tolerance to chloramphenicol in V583, by comparison of cells treated with subinhibitory concentrations of chloramphenicol and untreated V583 cells. During a time course experiment, more than 600 genes were significantly differentially transcribed. Since chloramphenicol affects protein synthesis in bacteria, many genes involved in protein synthesis, for example, genes for ribosomal proteins, were induced. Genes involved in amino acid biosynthesis, for example, genes for tRNA synthetases and energy metabolism were downregulated, mainly. Among the upregulated genes were EF1732 and EF1733, which code for potential chloramphenicol transporters. Efflux of drug out of the cells may be one mechanism used by V583 to overcome the effect of chloramphenicol
Estimation of Thalamocortical and Intracortical Network Models from Joint Thalamic Single-Electrode and Cortical Laminar-Electrode Recordings in the Rat Barrel System
A new method is presented for extraction of population firing-rate models for
both thalamocortical and intracortical signal transfer based on stimulus-evoked
data from simultaneous thalamic single-electrode and cortical recordings using
linear (laminar) multielectrodes in the rat barrel system. Time-dependent
population firing rates for granular (layer 4), supragranular (layer 2/3), and
infragranular (layer 5) populations in a barrel column and the thalamic
population in the homologous barreloid are extracted from the high-frequency
portion (multi-unit activity; MUA) of the recorded extracellular signals. These
extracted firing rates are in turn used to identify population firing-rate
models formulated as integral equations with exponentially decaying coupling
kernels, allowing for straightforward transformation to the more common
firing-rate formulation in terms of differential equations. Optimal model
structures and model parameters are identified by minimizing the deviation
between model firing rates and the experimentally extracted population firing
rates. For the thalamocortical transfer, the experimental data favor a model
with fast feedforward excitation from thalamus to the layer-4 laminar population
combined with a slower inhibitory process due to feedforward and/or recurrent
connections and mixed linear-parabolic activation functions. The extracted
firing rates of the various cortical laminar populations are found to exhibit
strong temporal correlations for the present experimental paradigm, and simple
feedforward population firing-rate models combined with linear or mixed
linear-parabolic activation function are found to provide excellent fits to the
data. The identified thalamocortical and intracortical network models are thus
found to be qualitatively very different. While the thalamocortical circuit is
optimally stimulated by rapid changes in the thalamic firing rate, the
intracortical circuits are low-pass and respond most strongly to slowly varying
inputs from the cortical layer-4 population
Daylength influences the response of three clover species (Trifolium spp.) to short-term ozone stress
-Long photoperiods characteristic of summers at high latitudes can increase ozone-induced foliar injury in subterranean clover (Trifolium subterraneum) This study compared the effects of long photoperiods on ozone injury in red and white clover cultivars adapted to shorter or longer daylengths of southern or northern Fennoscandia. Plants were exposed to 70 ppb ozone for six hours during the daytime for three consecutive days. Simultaneously, the daylength in the growth rooms was altered to long-day (10 h light; 14 h dim
light) and short-day (10 h light; 14 h darkness) conditions. Thermal imaging showed that ozone disrupted leaf temperature and stomatal function, particularly in sensitive species, in
which leaf temperature deviations persisted for several days after ozone exposure. Longday conditions increased visible foliar injury (30%–70%), characterized by chlorotic and
necrotic areas, relative to short day conditions in all species and cultivars independently of the photoperiod in the region they were adapted to
Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models
<p>Abstract</p> <p>Background</p> <p>Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy <it>C</it>-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function.</p> <p>Results</p> <p>Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops.</p> <p>Conclusions</p> <p>HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.</p
- …