    Tools for Exploring Multivariate Data: The Package ICS

    Invariant coordinate selection (ICS) has recently been introduced as a method for exploring multivariate data. It includes as a special case a method for recovering the unmixing matrix in independent components analysis (ICA). It also serves as a basis for classes of multivariate nonparametric tests, and as a tool in cluster analysis or blind discrimination. The aim of this paper is to briefly explain the (ICS) method and to illustrate how various applications can be implemented using the R package ICS. Several examples are used to show how the ICS method and ICS package can be used in analyzing a multivariate data set.

    Testes não-paramétricos de independência baseados em uma variação da estatística de Hoeffding

    Orientador: Jesus Enrique GarciaTese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação CientíficaResumo: Propomos uma nova medida para estudar o problema da detecção de alguma estrutura de associação em um vetor aleatório contínuo de qualquer dimensão. Esta medida é uma variação da estatística de Hoeffding. Nosso foco de estudo são as estruturas de dependência onde a hipótese de independência é difícil de rejeitar. A construção desta nova medida foi inspirada pelo teste não-paramétrico que foi publicado no artigo \cite{blum1961}. A definição dessa medida incorpora uma função não-linear da função de distribuição, bem como uma função não-linear das distribuições marginais. Criamos dois testes não-paramétricos baseados na função de distribuição da amostra. O primeiro, para estudar o problema de detectar alguma estrutura de associação em um vetor aleatório contínuo de qualquer dimensão, e o segundo, para estudar o problema da associação entre pp-partes de um vetor aleatório contínuo de qualquer dimensão. Estudamos as propriedades da nova medida e comparamos o desempenho dos novos testes com os vários testes na literatura por simulação. Finalmente, apresentamos uma aplicação para dados reais. Para o primeiro teste, apresentamos uma aplicação para o caso bidimensional e tridimensional. A aplicação do caso bidimensional é sobre a associação entre duas condições pulmonares. A aplicação do caso tridimensional é sobre a relação entre a temperatura, a umidade e o dióxido de carbono (CO2). Para o segundo teste, apresentamos uma aplicação sobre índices de mercado. Os dados são sobre os índices do mercado de ações dos EUA e da Ásia nos meses de janeiro e fevereiro de 2016Abstract: We propose a measure to study the problem of detecting some association within and between continuous random vectors of any dimension. This measure is a variation of the Hoeffding measure D defined in Hoeffding (1948). We focus our attention on the types of dependence where the hypothesis of independence is difficult to reject. The building of this measure has been inspired by the non-parametric test published in the article Blum et al. (1961). The definition of this measure incorporates a non-linear function of the joint density as well as a non-linear function of the marginal densities. Based on this measure, we build two non-parametric tests based on the sample distribution function, which study the problem of detecting some association within and between p parts of a multivariate continuous random vector. We study the properties of the measure and compare the performance of the new tests with that of the various tests in the literature using a whole simulation study. Finally, we present an application to real data. For the first test, we present an application for the two-dimensional and three-dimensional case. The application of the two-dimensional case is about association between two pulmonary conditions. The application of the three-dimensional case is about the relation between the temperature, humidity, and carbon dioxide pCO 2 q. For the second test, we present an application about market indices. The data are about the stock market indices of the US and Asia in the months January and February of 2016DoutoradoEstatisticaDoutor em EstatísticaCAPE

    Signed-rank tests for location in the symmetric independent component model

    The so-called independent component (IC) model states that the observed p-vector X is generated via X = Λ Z + μ, where μ is a p-vector, Λ is a full-rank matrix, and the centered random vector Z has independent marginals. We consider the problem of testing the null hypothesis H0 :μ = 0 on the basis of i.i.d. observations X1, Xn generated by the symmetric version of the IC model above (for which all ICs have a symmetric distribution about the origin). In the spirit of [M. Hallin, D. Paindaveine, Optimal tests for multivariate location based on interdirections and pseudo-Mahalanobis ranks, Annals of Statistics, 30 (2002), 1103-1133], we develop nonparametric (signed-rank) tests, which are valid without any moment assumption and are, for adequately chosen scores, locally and asymptotically optimal (in the Le Cam sense) at given densities. Our tests are measurable with respect to the marginal signed ranks computed in the collection of null residuals over(Λ, ̂)- 1 Xi, where over(Λ, ̂) is a suitable estimate of Λ. Provided that over(Λ, ̂) is affine-equivariant, the proposed tests, unlike the standard marginal signed-rank tests developed in [M.L. Puri, P.K. Sen, Nonparametric Methods in Multivariate Analysis, Wiley & Sons, New York, 1971] or any of their obvious generalizations, are affine-invariant. Local powers and asymptotic relative efficiencies (AREs) with respect to Hotelling's T2 test are derived. Quite remarkably, when Gaussian scores are used, these AREs are always greater than or equal to one, with equality in the multinormal model only. Finite-sample efficiencies and robustness properties are investigated through a Monte Carlo study. © 2008 Elsevier Inc. All rights reserved.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Invariant Coordinate Selection and New Approaches for Independent Component Analysis

    Tämän väitöskirjatyön tavoitteena oli tarkastella invarianttien koordinaattien valintaa ja tuoda uusia näkökulmia riippumattomien komponenttien analyysiin. Moniulotteisten tilastollisten menetelmien yhteydessä kysymykset invarianttisuudesta ja ekvivarianttisuudesta nousevat usein esille. Toisinaan tilastollisia menetelmiä joudutaan muokkaamaan, jotta niille voidaan löytää invariantti tai ekvivariantti vastine. Tämä voidaan tehdä esimerkiksi transformoimalla data invarianttiin koordinaattisysteemiin. Kahdessa väitöskirja-artikkelissa käsitellään invarianttien koordinaattien valintaa (ICS) ja ICS funktionaaleja. Moniulotteisen aineiston standardointia, ja ICS funktionaalien ja otossuureiden (asymptoottisia) ominaisuuksia tarkastellaan kattavasti. Myös moniulotteista huipukkuutta ja vinoutta käsitellään. ICS transformaatioiden sovellusalueista keskustellaan. Yksi tärkeä sovellusalue on riippumattomien komponenttien analyysi. Riippumattomien komponenttien analyysi (ICA) on hyvin ajankohtainen tutkimusalue ja sillä on useita käytännön sovelluskohteita. Riippumattomien komponenttien mallissa p-ulotteisen satunnaisvektorin alkioiden oletetaan olevan sellaisen tuntemattoman p-ulotteisen satunnaisvektorin alkioiden lineaarikombinaatioita, jonka alkiot ovat toisistaan riippumattomia. Riippumattomien komponenttien analyysissä tavoitteena on löytää riippumattomat komponentit estimoimalla matriisia, joka välittää edellä kuvatun lineaaritransformaation. Uusia näkökulmia riippumattomien komponenttien analyysiin esitetään kolmessa väitöskirjan artikkelissa. Yhdessä väitöskirja-artikkelissa esitetään uusi versio suositusta Deflation-based FastICA estimaattorista/algoritmista, jossa riippumattomat komponentit etsitään yksi kerrallaan. Tässä uudessa versiossa riippumattomat komponentit löydetään optimaalisessa järjestyksessä. Yhdessä artikkelissa esitetään (Le Cam mielessä) optimaalisia testaus - ja estimointimenetelmiä, kun riippumattomien komponenttien oletetaan tulevan symmetrisistä jakaumista. Yhdessä artikkelissa esitetään uusi menetelmä, jolla voidaan verrata erilaisia ICA estimaattoreita keskenään. Kaikissa kolmessa artikkelissa esitetään asymptoottisia tuloksia. Väitöskirjan viimeisessä luvussa uusia menetelmiä sovelletaan käytännön aineistoon.The aim of this doctoral thesis was to explore (asymptotical) characteristics of invariant coordinate system functionals and to introduce new approaches for independent component analysis. Equivariance and invariance issues arise in multivariate statistical analysis. Sometimes statistical procedures have to be modified to obtain an affine equivariant or invariant version. This can be done by preprocessing the data, e.g., by standardizing the multivariate data or by transforming the data to an invariant coordinate system. Two of the original articles deal with invariant coordinate selection and invariant coordinate system (ICS) functionals. Standardization of multivariate distributions, and characteristics of ICS functionals and statistics are examined. Also invariances up to some groups of transformations are discussed. Constructions of ICS functionals are addressed and asymptotical properties are explored. Also functionals and estimates of multivariate skewness and kurtosis are addressed. Application areas of ICS transformations are discussed. One important example of such application areas is independent component analysis. Independent component analysis is a very timely research area with a wide field of applications. In the independent component model the elements of a p-variate random vector are assumed to be linear combinations of the elements of an unobservable p-variate vector with mutually independent components. In the independent component analysis the aim is to recover the independent components by estimating an unmixing matrix that transforms the observed pp-variate vector to the independent components. New approaches for independent component analysis are provided in three of the original articles. Deflation-based FastICA, where independent components are extracted one-by-one, is among the most popular methods for estimating an unmixing matrix in the independent component model. In the literature, it is often seen rather as an algorithm than an estimator related to a certain objective function, and only recently its statistical properties have been derived. One of the recent findings is that the order, in which the independent components are extracted in practice, has a strong effect on the performance of the estimator. A new reloaded procedure, to ensure that the independent components are extracted in an optimal order, is proposed in one of the articles. In one of the original articles, new optimal (in Le Cam sense) inference procedures are developed under symmetry assumption of the independent components. The inference procedures are based on signed ranks. Hypothesis tests, estimators and confidence regions are provided, and asymptotical properties are examined. The independent component model can be formulated in several ways: If the elements of a vector of independent components are permuted or multiplied by nonzero scalars, the vector still has independent components. The comparison of the performances of different unmixing matrix estimates is then difficult as the estimates are for different population quantities. A new natural performance index is suggested in one of the articles. The index is proven to possess several nice properties compared to previously presented indices, and it is easy and fast to compute. Also limiting behavior of the index, as the sample size approaches infinity, is explored. To demonstrate the use of the new methods in practise, a data example is provided in the last chapter of this thesis

