193 research outputs found

    CenetBiplot: a new proposal of sparse and orthogonal biplots methods by means of elastic net CSVD

    Get PDF
    [EN[ In this work, a new mathematical algorithm for sparse and orthogonal constrained biplots, called CenetBiplots, is proposed. Biplots provide a joint representation of observations and variables of a multidimensional matrix in the same reference system. In this subspace the relationships between them can be interpreted in terms of geometric elements. CenetBiplots projects a matrix onto a low-dimensional space generated simultaneously by sparse and orthogonal principal components. Sparsity is desired to select variables automatically, and orthogonality is necessary to keep the geometrical properties that ensure the biplots graphical interpretation. To this purpose, the present study focuses on two different objectives: 1) the extension of constrained singular value decomposition to incorporate an elastic net sparse constraint (CenetSVD), and 2) the implementation of CenetBiplots using CenetSVD. The usefulness of the proposed methodologies for analysing high-dimensional and low-dimensional matrices is shown. Our method is implemented in R software and available for download from https://github.com/ananieto/SparseCenetMA.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was not supported by any grant.Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

    An exploratory data analysis method to reveal modular latent structures in high-throughput data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Modular structures are ubiquitous across various types of biological networks. The study of network modularity can help reveal regulatory mechanisms in systems biology, evolutionary biology and developmental biology. Identifying putative modular latent structures from high-throughput data using exploratory analysis can help better interpret the data and generate new hypotheses. Unsupervised learning methods designed for global dimension reduction or clustering fall short of identifying modules with factors acting in linear combinations.</p> <p>Results</p> <p>We present an exploratory data analysis method named MLSA (Modular Latent Structure Analysis) to estimate modular latent structures, which can find co-regulative modules that involve non-coexpressive genes.</p> <p>Conclusions</p> <p>Through simulations and real-data analyses, we show that the method can recover modular latent structures effectively. In addition, the method also performed very well on data generated from sparse global latent factor models. The R code is available at <url>http://userwww.service.emory.edu/~tyu8/MLSA/</url>.</p

    Contributions to Functional Data Analysis with Applications to Modeling Time Series and Panel Data

    Get PDF
    In Chapter 1 we propose a new perspective on modeling and forecasting electricity spot prices. Our approach is motivated by the data-generating process of electricity spot prices, which is well described what is called the merit order model. The merit order model is a micro economic model based on the assumption that spot prices on electricity exchanges are determined by the marginal generation costs of the last power plant that is required to cover the demand. The resulting merit order curve reflects the increasing generation costs of the installed power plants. Correspondingly, we suggest interpreting hourly electricity spot prices as noisy discretization points of smooth price functions. These price functions are modeled by a functional factor model (FFM) for which we discuss a two-step estimation procedure. The first step is a classical pre-smoothing step in order to estimate the single price functions from the noisy discretization points. The second step then aims for a robust estimation of a finite set of common basis functions from the pre-smoothed price functions. In doing this, we carefully consider the issue of finding an optimal smoothing parameter. The presentation of our functional factor model concludes with an extensive forecast study which compares our FFM with alternative time series models that have been successfully applied in the literature on electricity spot prices. The forecast study clearly confirms the superior power of our functional factor model and the use of price functions as underlying structures of electricity spot prices in general. A slightly modified version of Chapter 1 is forthcoming as a single-authored article in "The Annals of Applied Statistics"; see Liebl (2013). Chapter 2 further discusses the problem of modeling electricity spot prices. On the one hand, we extend the concept of price function introduced in Chapter 1 using covariables. On the other hand, we focus on a generally deeper theoretical consideration of the involved multivariate nonparametric regression model, which is used as a tool for FPCA. We extend existing theoretical results with respect to FPCA for sparse functional data by considering the asymptotic bias and variance of the multivariate local linear estimator of the mean and the covariance functions. Here, we carefully consider the effects of between-correlations, which are caused by the time series context, and the effects of within-correlations, which are caused by the functional nature of the data. In order to demonstrate the usefulness of our model we analyze the effects of Germany's nuclear moratorium on March 14, 2011. This event describes a natural experiment, since in the course of Germany's nuclear moratorium on March 14, 2011, eight nuclear power plants were phased out [Nestle (2012)]. The data set analyzed in Chapter 2 covers exactly one year before and one year after Germany's nuclear power phase-out. We apply our model separately to these two time spans in order to contrast the different market situations. In Chapter 3 we pick up the successful application of FDA within the literature on panel data models. Recent panel data models allow us to control for complex unobserved heterogeneity effects by the incorporation of latent factor models. This new kind of panel data models extends the classical concept of individual random (scalar) effects to random processes or random functions [see, e.g., Bai, Kao and Ng (2009), Bai (2009), and Kneip, Sickles and Sond (2012)]. Even though this class of panel models is of high relevance for practical problems such as stochastic frontier analysis, they are still rarely applied in the empirical literature. Our implementation of these methods in the statistical software package of phtt provides a first step towards facilitating their application. As the estimation procedure of Kneip, Sickles and Sond (2012) involves nonparametric smoothing methods, the choice of a reliable procedure to find an optimal smoothing parameter is most important for implementing the estimation procedure in a statistical software package. We consider this problem and suggest to use the technique of ``parameter-cascading'' in order to approximate an upper bound for the optimal smoothing parameter [see also Cao and Ramsay (2010)]. The final optimal smoothing parameter lies somewhere between this approximated upper bound and zero. Knowledge of this interval allows for a robust implementation of the computationally costly cross validation criterion. A slightly modified version of Chapter 3 is accepted as a co-authored article for the "Journal of Statistical Software"; see Bada and Liebl (2013)

    Multiproduct Pricing in Major League Baseball: A Principal Components Analysis

    Get PDF
    The empirical analysis of multiproduct pricing suffers from a lack of clear theoretical guidance and appropriate data, limitations which often render traditional regression-based analyses impractical. This paper analyzes ticket, parking, and concession pricing in Major League Baseball for the period 1991-2003 using a new methodology based on principal components, which allows inferences to be formed about the factors underlying price variation without strong theoretical guidance or abundant information about costs and demand. While general demand shifts are the most important factor, they explain only half of overall price variation. Also important are price interactions that derive from demand interrelationships between goods and the desire to maximize the capture of consumer surplus in the presence of heterogeneous demand.

    Sparsity in partial least squares regression models

    Get PDF
    Data sets with multiple responses and multiple predictor variables are increasingly common. It is known that such data sets often exhibit near multicollinearity and the traditional ordinary least squares (OLS) regression method do not perform well in such a setting because the mean square error of the OLS regression coefficients will be large and prediction performance will be poor. This drawback of OLS is often handled by using well-known dimension reduction methods; the focus in this thesis is Partial Least Squares (PLS). The following contributions are made in the thesis: (a) Introduce relevant components (RC) models characterized by restrictions on the joint covariance matrix of the response and predictor variables, and show that the univariate (single-response) version of the RC model can be represented as a Krylov model. These representations will shed more light on the understanding of PLS. Also, PLS algorithms are reviewed and presented as estimators of the RC models. (b) Unify various multiple-response regression models under the framework of the RC models, and review some multiple-response PLS methods. In addition, simulation studies are carried out to compare the prediction performance of multivariate PLS (PLS2) methods. (c) Propose novel sparse multivariate PLS (SPLS2) methods for parameter estimation and variable selection, which offers more flexibility compared to known SPLS2 methods, and compare the novel methods against methods in the literature in terms of prediction performance and accuracy in variable selection. (d) Apply the PLS regression methods to a proteomics data set to predict the severity of systemic sclerosis and identify candidate markers. Furthermore, compare the PLS, SPLS and OLS methods with regard to predictive ability using the proteomics data

    Factor Models in Finance

    Get PDF
    corecore