16,333 research outputs found
High Dimensional Semiparametric Scale-Invariant Principal Component Analysis
We propose a new high dimensional semiparametric principal component analysis
(PCA) method, named Copula Component Analysis (COCA). The semiparametric model
assumes that, after unspecified marginally monotone transformations, the
distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA
in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust
to outliers and data contamination; (iii) It is scale-invariant and yields more
interpretable results. We prove that the COCA estimators obtain fast estimation
rates and are feature selection consistent when the dimension is nearly
exponentially large relative to the sample size. Careful experiments confirm
that COCA outperforms sparse PCA on both synthetic and real-world datasets.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPMAI
A new mixture copula model for spatially correlated multiple variables with an environmental application
In environmental monitoring, multiple spatial variables are often sampled at a geographical location that can depend on each other in complex ways, such as non-linear and non-Gaussian spatial dependence. We propose a new mixture copula model that can capture those complex relationships of spatially correlated multiple variables and predict univariate variables while considering the multivariate spatial relationship. The proposed method is demonstrated using an environmental application and compared with three existing methods. Firstly, improvement in the prediction of individual variables by utilising multivariate spatial copula compares to the existing univariate pair copula method. Secondly, performance in prediction by utilising mixture copula in the multivariate spatial copula framework compares with an existing multivariate spatial copula model that uses a non-linear principal component analysis. Lastly, improvement in the prediction of individual variables by utilising the non-linear non-Gaussian multivariate spatial copula model compares to the linear Gaussian multivariate cokriging model. The results show that the proposed spatial mixture copula model outperforms the existing methods in the cross-validation of actual and predicted values at the sampled locations
Kullback-Leibler Divergence-Guided Copula Statistics-Based Blind Source Separation of Dependent Signals
In this paper, we propose a blind source separation of a linear mixture of
dependent sources based on copula statistics that measure the non-linear
dependence between source component signals structured as copula density
functions. The source signals are assumed to be stationary. The method
minimizes the Kullback-Leibler divergence between the copula density functions
of the estimated sources and of the dependency structure. The proposed method
is applied to data obtained from the time-domain analysis of the classical
11-Bus 4-Machine system. Extensive simulation results demonstrate that the
proposed method based on copula statistics converges faster and outperforms the
state-of-the-art blind source separation method for dependent sources in terms
of interference-to-signal ratio.Comment: Submitted to the ISGT NA 202
Bayesian Model Choice of Grouped t-copula
One of the most popular copulas for modeling dependence structures is
t-copula. Recently the grouped t-copula was generalized to allow each group to
have one member only, so that a priori grouping is not required and the
dependence modeling is more flexible. This paper describes a Markov chain Monte
Carlo (MCMC) method under the Bayesian inference framework for estimating and
choosing t-copula models. Using historical data of foreign exchange (FX) rates
as a case study, we found that Bayesian model choice criteria overwhelmingly
favor the generalized t-copula. In addition, all the criteria also agree on the
second most likely model and these inferences are all consistent with classical
likelihood ratio tests. Finally, we demonstrate the impact of model choice on
the conditional Value-at-Risk for portfolios of six major FX rates
Forecasting commodity futures using Principal Component Analysis and Copula
The ever ongoing battle to beat the market is in this thesis fought with the help of mathematics with a way to reduce the information to its core. It is called PCA, Principal Component Analysis. This is used to build a model of future commodity prices. To assist PCA, Copula is used - a sort of mathematical glue which can bring multiple distributions together and represented as one. The data used is 5 years of prices for Brent Oil, WTI Oil, Gold, Copper and Aluminium. The model parameters are tted to 2.5 years of data and then tested on the remaining 2.5 years. MLE, Maximum Likelihood Estimation, was used for parameter estimation and distributions that were found tting were logistic and Student's T distribution Cramer-von Mises tests were used to determine that T Copula was the most suitable Copula. The main results are that the mathematical estimations t well and prot can be generated, but with a low Sharpe Ratio.Introduktion Inom nansbranschen sa krigar handlare dagligen om att overlista deras ende - marknaden. Men, med dagens kommunikationsmedel sa blir handlarna overosta med information och for att kunna salla bland all den informationen - sa kan matematik anvandas. Denna studien forsoker ta ut den mest vitala informationen man kan fa fran historisk data - och bygga en modell av hur man tror priserna kommer att forandras. De matematiska hjalpmedlena som anvands kallas principalkomponentanalys och copula. For att sedan se hur val modellen utfaller, anvands framforallt tva handelsstrategier - "atergang till medelvarde" och "momentum". Historia Studien inriktar sig generellt pa ravaror och specikt pa olja, guld, koppar och aluminium. Sjalva handeln med ravaror stracker sig era artusenden tillbaka. Man far ga tillbaka till den summeriska civilisationen, cirka ar 4500-4000 f.kr., for att nna de forsta tecknen pa ravaruhandel. Da anvandes lertavlor for att visa pa exempelvis hur manga getter som skulle levereras vid en viss tidpunkt. Ravaruhandel Nufortiden slipper man skriva kontrakt pa lertavlor och en leverans av guld eller majs ar latt tillgangligt fran din dator. Det vanligaste sattet att handla med ravaror ar att anvanda en typ av kontrakt som kallas futures. Med dessa specieras exakt hur mycket av en viss ravara skall levereras pa en viss tidpunkt for ett visst pris. Priset varierar och beror framst pa utbud och efterfragan men mangden och tidpunkten for leverans forblir detsamma. Principalkomponentanalys Sjalva "sallandet av information" kallas i detta fall principalkomponentanalys. Denna analys gors for att fa ut de variabler som kan forklara hur prissvangningarna pa ravarorna sker. Variablerna optimeras pa den forsta halvan av datan och appliceras sedan pa den andra halvan. Denna metoden, da man optimerar pa forsta halvan och testar pa andra halvan, ar valdigt vanlig inom modellering av handelsstrategier. Detta da om man optimerar pa hela dataperioden, sa kan man latt "overoptimera" sina resultat och anpassa sin modell till det som har varit istallet for att forsoka skapa nagot som haller i framtiden. 1 Copula For att modellera de olika ravarornas priser sa anpassas de till olika slags fordelningar. Dessa fordelningar kan vara individuella for varje ravara och "klistras" sedan samman med copula. Det vill saga att man ser hur fordelningarna forhaller sig till varandra, hur de "klistras" samman. Detta klister kan man sedan anvanda for att forutspa framtida prissvangningar. Aterg ang till medelv arde En av de vanligaste handelsstrategierna som nns kallas pa engelska mean- reversion (atergang till medelvarde pa svenska). Denna strategin bygger pa att man tror att alla priser kommer att jamna ut sig till sist och aterga till det priset som det har varit(dess gamla medelvarde). Men, detta ar en strategi som inte tar nagon beaktan i de manskliga faktorer som kan nnas pa borser. Momentum Momentum ar den strategi som tar de mer mjuka vardena i beaktan, sasom vissa typer av ockbeteenden (folk koper nar andra koper och vice versa da folk saljer). Detta kanaterspeglas i matematik och ar i princip den motsatta strategin till "atergang till medelvarde". Men i studien balanserar dessa tva strategier varandra och beroende pa hur de historiska prissvangningarna har sett ut sa dominerar antingen den ena eller den andra strategin. Detta fram till da "copula-klistret" trader in. Da kan klistret bli den an mer dominerande faktorn och bestamma handelsstrategi. Resultat Da endast principalkomponentanalysen med sina tillhorande handelsstrategier "atergang till medelvarde" och momentum tas i beaktan sa genereras ingen vinst. Detta kan bero pa manga orsaker men som oftast i matematiken sa vill man utoka mangden data. Om studien gjorts pa er antal ravaror sa ar det mojligt att denna del visat ett battre resultat. Nar sedan "copulaklistret" blir en del av modellen, kan vinst genereras. Den ar huruvida inte sa stor men det forefaller att modellen har potential. Nar samtliga handelsstrategier har tagits i beaktan in i modellen sa visas alltsa att man kan tjana pengar pa modellen. Tankarna och iderna som modellen bygger pa kan alltsa forhoppningsvis hjalpa nagon handlare i slaget mot marknaden
Measuring reproducibility of high-throughput experiments
Reproducibility is essential to reliable scientific discovery in
high-throughput experiments. In this work we propose a unified approach to
measure the reproducibility of findings identified from replicate experiments
and identify putative discoveries using reproducibility. Unlike the usual
scalar measures of reproducibility, our approach creates a curve, which
quantitatively assesses when the findings are no longer consistent across
replicates. Our curve is fitted by a copula mixture model, from which we derive
a quantitative reproducibility score, which we call the "irreproducible
discovery rate" (IDR) analogous to the FDR. This score can be computed at each
set of paired replicate ranks and permits the principled setting of thresholds
both for assessing reproducibility and combining replicates. Since our approach
permits an arbitrary scale for each replicate, it provides useful descriptive
measures in a wide variety of situations to be explored. We study the
performance of the algorithm using simulations and give a heuristic analysis of
its theoretical properties. We demonstrate the effectiveness of our method in a
ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …