59 research outputs found

    A new formula for fast computation of segmented cross validation residuals in linear regression modelling -- providing efficient regularisation parameter estimation in Ridge Regression and the Tikhonov Regularisation Framework

    Full text link
    In the present paper we prove a new theorem, resulting in an exact updating formula for linear regression model residuals to calculate the segmented cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of our theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated PRESS statistic. We also suggest strategies for quick estimation of the exact minimum PRESS value and full PRESS function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge-/Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated for several practical applications.Comment: 33 pages, 10 figure, 8 table

    Orders of magnitude speed increase in Partial Least Squares feature selection with new simple indexing technique for very tall data sets

    Get PDF
    Feature selection is a challenging combinatorial optimization problem that tends to require a large number of candidate feature subsets to be evaluated before a satisfying solution is obtained. Because of the computational cost associated with estimating the regression coefficients for each subset, feature selection can be an immensely time-consuming process and is often left inadequately explored. Here, we propose a simple modification to the conventional sequence of calculations involved when fitting a number of feature subsets to the same response data with partial least squares (PLS) model fitting. The modification consists in establishing the covariance matrix for the full set of features by an initial calculation and then deriving the covariance of all subsequent feature subsets solely by indexing into the original covariance matrix. By choosing this approach, which is primarily suitable for tall design matrices with significantly more rows than columns, we avoid redundant (identical) recalculations in the evaluation of different feature subsets. By benchmarking the time required to solve regression problems of various sizes, we demonstrate that the introduced technique outperforms traditional approaches by several orders of magnitude when used in conjunction with PLS modeling. In the supplementary material, we provide code for implementing the concept with kernel PLS regression.acceptedVersio

    Selection of principal variables through a modified Gram–Schmidt process with and without supervision

    Get PDF
    In various situations requiring empirical model building from highly multivariate measurements, modelling based on partial least squares regression (PLSR) may often provide efficient low-dimensional model solutions. In unsupervised situations, the same may be true for principal component analysis (PCA). In both cases, however, it is also of interest to identify subsets of the measured variables useful for obtaining sparser but still comparable models without significant loss of information and performance. In the present paper, we propose a voting approach for sparse overall maximisation of variance analogous to PCA and a similar alternative for deriving sparse regression models influenced closely related to the PLSR method. Both cases yield pivoting strategies for a modified Gram–Schmidt process and its corresponding (partial) QRfactorisation of the underlying data matrix to manage the variable selection process. The proposed methods include score and loading plot possibilities that are acknowledged for providing efficient interpretations of the related PCA and PLS models in chemometric applications.Selection of principal variables through a modified Gram–Schmidt process with and without supervisionpublishedVersio

    The Response of Enterococcus faecalis V583 to Chloramphenicol Treatment

    Get PDF
    Many Enterococcus faecalis strains display tolerance or resistance to many antibiotics, but genes that contribute to the resistance cannot be specified. The multiresistant E. faecalis V583, for which the complete genome sequence is available, survives and grows in media containing relatively high levels of chloramphenicol. No specific genes coding for chloramphenicol resistance has been recognized in V583. We used microarrays to identify genes and mechanisms behind the tolerance to chloramphenicol in V583, by comparison of cells treated with subinhibitory concentrations of chloramphenicol and untreated V583 cells. During a time course experiment, more than 600 genes were significantly differentially transcribed. Since chloramphenicol affects protein synthesis in bacteria, many genes involved in protein synthesis, for example, genes for ribosomal proteins, were induced. Genes involved in amino acid biosynthesis, for example, genes for tRNA synthetases and energy metabolism were downregulated, mainly. Among the upregulated genes were EF1732 and EF1733, which code for potential chloramphenicol transporters. Efflux of drug out of the cells may be one mechanism used by V583 to overcome the effect of chloramphenicol

    Estimation of Thalamocortical and Intracortical Network Models from Joint Thalamic Single-Electrode and Cortical Laminar-Electrode Recordings in the Rat Barrel System

    Get PDF
    A new method is presented for extraction of population firing-rate models for both thalamocortical and intracortical signal transfer based on stimulus-evoked data from simultaneous thalamic single-electrode and cortical recordings using linear (laminar) multielectrodes in the rat barrel system. Time-dependent population firing rates for granular (layer 4), supragranular (layer 2/3), and infragranular (layer 5) populations in a barrel column and the thalamic population in the homologous barreloid are extracted from the high-frequency portion (multi-unit activity; MUA) of the recorded extracellular signals. These extracted firing rates are in turn used to identify population firing-rate models formulated as integral equations with exponentially decaying coupling kernels, allowing for straightforward transformation to the more common firing-rate formulation in terms of differential equations. Optimal model structures and model parameters are identified by minimizing the deviation between model firing rates and the experimentally extracted population firing rates. For the thalamocortical transfer, the experimental data favor a model with fast feedforward excitation from thalamus to the layer-4 laminar population combined with a slower inhibitory process due to feedforward and/or recurrent connections and mixed linear-parabolic activation functions. The extracted firing rates of the various cortical laminar populations are found to exhibit strong temporal correlations for the present experimental paradigm, and simple feedforward population firing-rate models combined with linear or mixed linear-parabolic activation function are found to provide excellent fits to the data. The identified thalamocortical and intracortical network models are thus found to be qualitatively very different. While the thalamocortical circuit is optimally stimulated by rapid changes in the thalamic firing rate, the intracortical circuits are low-pass and respond most strongly to slowly varying inputs from the cortical layer-4 population

    Daylength influences the response of three clover species (Trifolium spp.) to short-term ozone stress

    Get PDF
    -Long photoperiods characteristic of summers at high latitudes can increase ozone-induced foliar injury in subterranean clover (Trifolium subterraneum) This study compared the effects of long photoperiods on ozone injury in red and white clover cultivars adapted to shorter or longer daylengths of southern or northern Fennoscandia. Plants were exposed to 70 ppb ozone for six hours during the daytime for three consecutive days. Simultaneously, the daylength in the growth rooms was altered to long-day (10 h light; 14 h dim light) and short-day (10 h light; 14 h darkness) conditions. Thermal imaging showed that ozone disrupted leaf temperature and stomatal function, particularly in sensitive species, in which leaf temperature deviations persisted for several days after ozone exposure. Longday conditions increased visible foliar injury (30%–70%), characterized by chlorotic and necrotic areas, relative to short day conditions in all species and cultivars independently of the photoperiod in the region they were adapted to

    Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy <it>C</it>-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function.</p> <p>Results</p> <p>Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops.</p> <p>Conclusions</p> <p>HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.</p
    corecore