88,932 research outputs found

    Extraction of the underlying structure of systematic risk from non-Gaussian multivariate financial time series using independent component analysis: Evidence from the Mexican stock exchange

    Get PDF
    Regarding the problems related to multivariate non-Gaussianity of financial time series, i.e., unreliable results in extraction of underlying risk factors -via Principal Component Analysis or Factor Analysis-, we use Independent Component Analysis (ICA) to estimate the pervasive risk factors that explain the returns on stocks in the Mexican Stock Exchange. The extracted systematic risk factors are considered within a statistical definition of the Arbitrage Pricing Theory (APT), which is tested by means of a two-stage econometric methodology. Using the extracted factors, we find evidence of a suitable estimation via ICA and some results in favor of the APT.Peer ReviewedPostprint (published version

    Assessment of bivariate normality

    Full text link
    There are three methods, which are most commonly used to assess the bivariate normality of paired data, two of which are also used to assess the multivariate normality. Nevertheless, none of the methods is very efficient or conclusive in their assessment of bivariate normality. In this thesis we are proposing a new method to test bivariate normality. This new method makes use of a set of if and only if conditions inherent in the theory of bivariate normal distribution. The proposed new method is highly efficient, accurate, and very easy to apply using any available standard statistical software

    Inference for High-Dimensional Doubly Multivariate Data under General Conditions

    Get PDF
    With technological, research, and theoretical advancements, the amount of data being generated for analysis is growing rapidly. In many cases, the number of subjects may be small, but the number of measurements taken on each subject may be very large. Consider, for example, two groups of patients. The subjects in one group are diseased and the other subjects are not. Over 9,000 relative fluorescent unit (RFU) signals, measures of the presence and abundance of proteins, are collected in a microarray or protoarray from each subject. Typically these kind of data show marked skewness (departure from normality) which invalidates standard multivariate normal-based theory. What is more, due to the cost involved, only a limited number of subjects can be included in the study. Therefore, standard large-sample asymptotic theory cannot be applied. It is of interest to determine if there are any differences in RFU signals between the two groups, and more importantly, if there are any RFU signal and group interaction effects. If such an interaction is detected, further research is warranted to identify any of these biological signals, commonly known as biomarkers. To address these types of phenomena, we present inferential procedures in two-factor repeated measures multivariate analysis of variance (RM-MANOVA) models where the covariance structure is unknown and the number of measurements per subject tends to infinity. Both in the univariate case, in which the number of dimensions or response variables is one, and the multivariate case, in which there are several response variables, different sums of squares and cross product matrices are proposed to compensate for the unknown structure of the covariance matrix and unbalanced group sizes. Based on the new matrices, we present some multivariate test statistics, deriving their asymptotic distributions under fairly general conditions. We then use simulation results to assess the performance of the tests, and we analyze a real data set to demonstrate their applicability

    Using R-based VOStat as a low resolution spectrum analysis tool

    Get PDF
    We describe here an online software suite VOStat written mainly for the Virtual Observatory, a novel structure in which astronomers share terabyte scale data. Written mostly in the public-domain statistical computing language and environment R, it can do a variety of statistical analysis on multidimensional, multi-epoch data with errors. Included are techniques which allow astronomers to start with multi-color data in the form of low-resolution spectra and select special kinds of sources in a variety of ways including color outliers. Here we describe the tool and demonstrate it with an example from Palomar-QUEST, a synoptic sky survey

    A computational framework to emulate the human perspective in flow cytometric data analysis

    Get PDF
    Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation. <p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods. <p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics

    Wind turbine condition monitoring strategy through multiway PCA and multivariate inference

    Get PDF
    This article states a condition monitoring strategy for wind turbines using a statistical data-driven modeling approach by means of supervisory control and data acquisition (SCADA) data. Initially, a baseline data-based model is obtained from the healthy wind turbine by means of multiway principal component analysis (MPCA). Then, when the wind turbine is monitorized, new data is acquired and projected into the baseline MPCA model space. The acquired SCADA data are treated as a random process given the random nature of the turbulent wind. The objective is to decide if the multivariate distribution that is obtained from the wind turbine to be analyzed (healthy or not) is related to the baseline one. To achieve this goal, a test for the equality of population means is performed. Finally, the results of the test can determine that the hypothesis is rejected (and the wind turbine is faulty) or that there is no evidence to suggest that the two means are different, so the wind turbine can be considered as healthy. The methodology is evaluated on a wind turbine fault detection benchmark that uses a 5 MW high-fidelity wind turbine model and a set of eight realistic fault scenarios. It is noteworthy that the results, for the presented methodology, show that for a wide range of significance, a in [1%, 13%], the percentage of correct decisions is kept at 100%; thus it is a promising tool for real-time wind turbine condition monitoring.Peer ReviewedPostprint (published version
    corecore