16,333 research outputs found

    High Dimensional Semiparametric Scale-Invariant Principal Component Analysis

    Full text link
    We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world datasets.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPMAI

    A new mixture copula model for spatially correlated multiple variables with an environmental application

    Get PDF
    In environmental monitoring, multiple spatial variables are often sampled at a geographical location that can depend on each other in complex ways, such as non-linear and non-Gaussian spatial dependence. We propose a new mixture copula model that can capture those complex relationships of spatially correlated multiple variables and predict univariate variables while considering the multivariate spatial relationship. The proposed method is demonstrated using an environmental application and compared with three existing methods. Firstly, improvement in the prediction of individual variables by utilising multivariate spatial copula compares to the existing univariate pair copula method. Secondly, performance in prediction by utilising mixture copula in the multivariate spatial copula framework compares with an existing multivariate spatial copula model that uses a non-linear principal component analysis. Lastly, improvement in the prediction of individual variables by utilising the non-linear non-Gaussian multivariate spatial copula model compares to the linear Gaussian multivariate cokriging model. The results show that the proposed spatial mixture copula model outperforms the existing methods in the cross-validation of actual and predicted values at the sampled locations

    Kullback-Leibler Divergence-Guided Copula Statistics-Based Blind Source Separation of Dependent Signals

    Full text link
    In this paper, we propose a blind source separation of a linear mixture of dependent sources based on copula statistics that measure the non-linear dependence between source component signals structured as copula density functions. The source signals are assumed to be stationary. The method minimizes the Kullback-Leibler divergence between the copula density functions of the estimated sources and of the dependency structure. The proposed method is applied to data obtained from the time-domain analysis of the classical 11-Bus 4-Machine system. Extensive simulation results demonstrate that the proposed method based on copula statistics converges faster and outperforms the state-of-the-art blind source separation method for dependent sources in terms of interference-to-signal ratio.Comment: Submitted to the ISGT NA 202

    Bayesian Model Choice of Grouped t-copula

    Full text link
    One of the most popular copulas for modeling dependence structures is t-copula. Recently the grouped t-copula was generalized to allow each group to have one member only, so that a priori grouping is not required and the dependence modeling is more flexible. This paper describes a Markov chain Monte Carlo (MCMC) method under the Bayesian inference framework for estimating and choosing t-copula models. Using historical data of foreign exchange (FX) rates as a case study, we found that Bayesian model choice criteria overwhelmingly favor the generalized t-copula. In addition, all the criteria also agree on the second most likely model and these inferences are all consistent with classical likelihood ratio tests. Finally, we demonstrate the impact of model choice on the conditional Value-at-Risk for portfolios of six major FX rates

    Forecasting commodity futures using Principal Component Analysis and Copula

    Get PDF
    The ever ongoing battle to beat the market is in this thesis fought with the help of mathematics with a way to reduce the information to its core. It is called PCA, Principal Component Analysis. This is used to build a model of future commodity prices. To assist PCA, Copula is used - a sort of mathematical glue which can bring multiple distributions together and represented as one. The data used is 5 years of prices for Brent Oil, WTI Oil, Gold, Copper and Aluminium. The model parameters are tted to 2.5 years of data and then tested on the remaining 2.5 years. MLE, Maximum Likelihood Estimation, was used for parameter estimation and distributions that were found tting were logistic and Student's T distribution Cramer-von Mises tests were used to determine that T Copula was the most suitable Copula. The main results are that the mathematical estimations t well and prot can be generated, but with a low Sharpe Ratio.Introduktion Inom nansbranschen sa krigar handlare dagligen om att overlista deras ende - marknaden. Men, med dagens kommunikationsmedel sa blir handlarna overosta med information och for att kunna salla bland all den informationen - sa kan matematik anvandas. Denna studien forsoker ta ut den mest vitala informationen man kan fa fran historisk data - och bygga en modell av hur man tror priserna kommer att forandras. De matematiska hjalpmedlena som anvands kallas principalkomponentanalys och copula. For att sedan se hur val modellen utfaller, anvands framforallt tva handelsstrategier - "atergang till medelvarde" och "momentum". Historia Studien inriktar sig generellt pa ravaror och specikt pa olja, guld, koppar och aluminium. Sjalva handeln med ravaror stracker sig era artusenden tillbaka. Man far ga tillbaka till den summeriska civilisationen, cirka ar 4500-4000 f.kr., for att nna de forsta tecknen pa ravaruhandel. Da anvandes lertavlor for att visa pa exempelvis hur manga getter som skulle levereras vid en viss tidpunkt. Ravaruhandel Nufortiden slipper man skriva kontrakt pa lertavlor och en leverans av guld eller majs ar latt tillgangligt fran din dator. Det vanligaste sattet att handla med ravaror ar att anvanda en typ av kontrakt som kallas futures. Med dessa specieras exakt hur mycket av en viss ravara skall levereras pa en viss tidpunkt for ett visst pris. Priset varierar och beror framst pa utbud och efterfragan men mangden och tidpunkten for leverans forblir detsamma. Principalkomponentanalys Sjalva "sallandet av information" kallas i detta fall principalkomponentanalys. Denna analys gors for att fa ut de variabler som kan forklara hur prissvangningarna pa ravarorna sker. Variablerna optimeras pa den forsta halvan av datan och appliceras sedan pa den andra halvan. Denna metoden, da man optimerar pa forsta halvan och testar pa andra halvan, ar valdigt vanlig inom modellering av handelsstrategier. Detta da om man optimerar pa hela dataperioden, sa kan man latt "overoptimera" sina resultat och anpassa sin modell till det som har varit istallet for att forsoka skapa nagot som haller i framtiden. 1 Copula For att modellera de olika ravarornas priser sa anpassas de till olika slags fordelningar. Dessa fordelningar kan vara individuella for varje ravara och "klistras" sedan samman med copula. Det vill saga att man ser hur fordelningarna forhaller sig till varandra, hur de "klistras" samman. Detta klister kan man sedan anvanda for att forutspa framtida prissvangningar. Aterg ang till medelv arde En av de vanligaste handelsstrategierna som nns kallas pa engelska mean- reversion (atergang till medelvarde pa svenska). Denna strategin bygger pa att man tror att alla priser kommer att jamna ut sig till sist och aterga till det priset som det har varit(dess gamla medelvarde). Men, detta ar en strategi som inte tar nagon beaktan i de manskliga faktorer som kan nnas pa borser. Momentum Momentum ar den strategi som tar de mer mjuka vardena i beaktan, sasom vissa typer av ockbeteenden (folk koper nar andra koper och vice versa da folk saljer). Detta kanaterspeglas i matematik och ar i princip den motsatta strategin till "atergang till medelvarde". Men i studien balanserar dessa tva strategier varandra och beroende pa hur de historiska prissvangningarna har sett ut sa dominerar antingen den ena eller den andra strategin. Detta fram till da "copula-klistret" trader in. Da kan klistret bli den an mer dominerande faktorn och bestamma handelsstrategi. Resultat Da endast principalkomponentanalysen med sina tillhorande handelsstrategier "atergang till medelvarde" och momentum tas i beaktan sa genereras ingen vinst. Detta kan bero pa manga orsaker men som oftast i matematiken sa vill man utoka mangden data. Om studien gjorts pa er antal ravaror sa ar det mojligt att denna del visat ett battre resultat. Nar sedan "copulaklistret" blir en del av modellen, kan vinst genereras. Den ar huruvida inte sa stor men det forefaller att modellen har potential. Nar samtliga handelsstrategier har tagits i beaktan in i modellen sa visas alltsa att man kan tjana pengar pa modellen. Tankarna och iderna som modellen bygger pa kan alltsa forhoppningsvis hjalpa nagon handlare i slaget mot marknaden

    Measuring reproducibility of high-throughput experiments

    Full text link
    Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach to measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike the usual scalar measures of reproducibility, our approach creates a curve, which quantitatively assesses when the findings are no longer consistent across replicates. Our curve is fitted by a copula mixture model, from which we derive a quantitative reproducibility score, which we call the "irreproducible discovery rate" (IDR) analogous to the FDR. This score can be computed at each set of paired replicate ranks and permits the principled setting of thresholds both for assessing reproducibility and combining replicates. Since our approach permits an arbitrary scale for each replicate, it provides useful descriptive measures in a wide variety of situations to be explored. We study the performance of the algorithm using simulations and give a heuristic analysis of its theoretical properties. We demonstrate the effectiveness of our method in a ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …