Selecting the Number of Components for Two-table Multivariate Methods

Abstract

Research in cognition and neuroscience often involves analyzing relationships between two sets of variables collected on the same set of observations. These “two-table” relationships are commonly analyzed using three related component-based methods—partial least squares correlation (PLSC), canonical correlation analysis (CCA), and redundancy analysis (RDA). However, selecting the appropriate number of components to retain in these methods remains a challenge. Several stopping rules—rules that determine the number of components to keep—have been developed for these two-table methods, but their performances have not been thoroughly evaluated. Further, many stopping rules have only been applied to one of the two-table methods despite their relevance for all three methods, and there has been little exploration into modifications that might improve the performance of these stopping rules. Additionally, many rules do not have easily accessible software implementations. To address these gaps, this dissertation evaluated four existing stopping rules and several new modifications to these rules by using simulated data with a known number of true components to estimate the Type I error rates and the power of the stopping rules. Out of 34 variations of these rules, four or five best rules were identified for each two-table method. The Type I error and power of these best rules were further examined in terms of various characteristics of the data, including the number of observations, variables, true components, and the strength of the relationships between the tables, in order to identify one or two rules with superior performance that are recommended for future use. Additionally, the most popular stopping rule—a permutation test using the singular values as test statistics—is not supported by this study because it showed high Type I error across the simulated data. As an illustrative example, a PLSC analysis was included for a real dataset (a subset of the publicly available LEMON dataset). This analysis explored relationships between participants’ cognitive performance and physiological measurements on two components selected by several of the best stopping rules. To facilitate future applications, an R package called componentts was developed. The package implements the stopping rules and data simulation so that researchers can use and test the stopping rules with additional simulated data beyond the data in this study, or test new stopping rules and easily compare their results to the stopping rules evaluated here

Similar works

Full text

thumbnail-image

Treasures @ UT Dallas

redirect
Last time updated on 26/04/2025

This paper was published in Treasures @ UT Dallas.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.