75 research outputs found
Joint Biplots for CoDa
Compositional data (CoDa) consist of vectors of positive values summing to a unit, or in general, to some fixed constant for all vectors. They appear as proportions, percentages, concentrations, absolute and relative frequencies. Sometimes, compositions arise from non-negative data (such as counts, area, weights, volume) that have been scaled by the total of the components because the analyst is not interested in the total sum of the vector. The multidimensional analysis of this kind of data requires a careful consideration because the sample space for CoDa is the simplex. The first consistent methodological proposal to deal with CoDa was proposed by Aitchison (1986) when he introduced the log-ratio approach. Basically, the idea that this approach conveys is to move from the simplex space to the real space by using log.ratio transformations, applying standard statistical methods, and finally, by means of an inverse log-ratio transformation, to interpret the results in the simplex space. Starting from this paper, pairwise, centered, additive and isometric log-ratio transformations, in short plr, clr, alr (Aitchison, 1986) and ilr respectively, are proposed in literature (Egozucue et al., 2003). In the context of dimension-reducing techniques, Aitchison (1983) proposed applying principal component analysis (PCA) after having applied a centered log-ratio (clr) transformation to CoDa. Aitchison and Greenacre (2002) suggested an adaptation of the biplot to CoDa. The biplot is a well established graphical aid in other branches of statistical analysis and can prove to be a useful exploratory and expository tool for compositions. In literature many papers on dimensional-reduction techniques for CoDa are proposed. Based on log-ratio strategy, Gallo (2012a, 2012b, 2013) recently proposed to use three-mode analysis of compositional data.\ud
Starting from Gallo (2012b), we propose using of plr and clr joint biplots. Where in some cases the plr joint biplot is the only ones that show clearly the correlations
Three–way compositional data: a multi–stage trilinear decomposition algorithm
The CANDECOMP/PARAFAC model is an extension of bilinear PCA
and has been designed to model three-way data by preserving their multidimensional
configuration. The Alternating Least Squares (ALS) procedure is the preferred
estimating algorithm for this model because it guarantees stable results. It
can, however, be slow at converging and sensitive to collinearity and over-factoring.
Dealing with these issues is even more pressing when data are compositional and
thus collinear by definition. In this talk the solution proposed is based on a multistage
approach. Here parameters are optimized with procedures that work better for
collinearity and over-factoring, namely ATLD and SWATLD, and then results are
refined with ALS
Analysis of Sentinel Node Biopsy and Clinicopathologic Features as Prognostic Factors in Patients With Atypical Melanocytic Tumors.
BACKGROUND: Atypical melanocytic tumors (AMTs) include a wide spectrum of melanocytic neoplasms that represent a challenge for clinicians due to the lack of a definitive diagnosis and the related uncertainty about their management. This study analyzed clinicopathologic features and sentinel node status as potential prognostic factors in patients with AMTs. PATIENTS AND METHODS: Clinicopathologic and follow-up data of 238 children, adolescents, and adults with histologically proved AMTs consecutively treated at 12 European centers from 2000 through 2010 were retrieved from prospectively maintained databases. The binary association between all investigated covariates was studied by evaluating the Spearman correlation coefficients, and the association between progression-free survival and all investigated covariates was evaluated using univariable Cox models. The overall survival and progression-free survival curves were established using the Kaplan-Meier method. RESULTS: Median follow-up was 126 months (interquartile range, 104-157 months). All patients received an initial diagnostic biopsy followed by wide (1 cm) excision. Sentinel node biopsy was performed in 139 patients (58.4%), 37 (26.6%) of whom had sentinel node positivity. There were 4 local recurrences, 43 regional relapses, and 8 distant metastases as first events. Six patients (2.5%) died of disease progression. Five patients who were sentinel node-negative and 3 patients who were sentinel node-positive developed distant metastases. Ten-year overall and progression-free survival rates were 97% (95% CI, 94.9%-99.2%) and 82.2% (95% CI, 77.3%-87.3%), respectively. Age, mitotic rate/mm2, mitoses at the base of the lesion, lymphovascular invasion, and 9p21 loss were factors affecting prognosis in the whole series and the sentinel node biopsy subgroup. CONCLUSIONS: Age >20 years, mitotic rate >4/mm2, mitoses at the base of the lesion, lymphovascular invasion, and 9p21 loss proved to be worse prognostic factors in patients with ATMs. Sentinel node status was not a clear prognostic predictor
Algorithms for compositional tensors of third-order
The PARAFAC-ALS procedure for estimating CP parameters on tridimen-sional tensors is sensitive to data collinearity. This inefficiency is especially problematic if collinearity is paired with other issues such as data of large dimensions and difficulties in establishing correct model rank. When dealing with compositional data, i.e. positive values with a covariance bias, multicollinearity is inherent by definition, and it is preserved also if the data is transformed in log-ratios by means of the clr function. For this reason, alternative estimating procedures may be considered, such as INT and INT-2. These dual-step methods use the properties of the SWATLD and ATLD algorithms during initialization to overcome ALS inefficiency while still providing least squares results. Their comparative performance is tested in an extensive simulation study on collinear data
Principal balances for three-way compositions
Orthonormal balances resulting from a sequential binary partition (SBP) are one of the preferred tools for transforming compositional data in real space coordinates. The interpretability of this approach, however, greatly depends on the relevance of the SBP. SBPs can be chosen with the help of expert knowledge or with data-oriented methods, such as Principal Balances analysis. This results in an SBP whose balances maximize the explained variance in a subsequent manner. Principal balances can be calculated in an exact way or in an approximate fashion by using methods based on PCA for compositional data. In this work a method for the approximation of principal balances in the more complex case of three-way compositions is proposed. Here the additional difficulty given by the introduction of third mode variability is dealt with. In particular an algorithm based on the Tucker3 model is used which allows to keep the variability of the third dimension separate in the definition of principal balances
Three-way compositional analysis of energy intensity in manufacturing
Both the scientific and political communities agree that significant reductions in CO2 emissions are necessary to limit the magnitude and extent of climate change and of course the energy efficiency is one of the most interesting issues analyzed by economists and policy makers within this debate. Different measures of energy efficiency in manufacturing can be defined but broadly this is the ratio of the production output to the energy input, usually disaggregated by industry. We create a global data set of energy intensity in manufacturing and analyze its structure by country, time and industry applying parallel factor analysis (CP). Since we are interested in the structure of the energy intensity, the absolute values are no more relevant for the analysis and the nature of this data set is compositional which requires specific adaptation of the methodology and suitable software
An ATLD–ALS method for the trilinear decomposition of large third-order tensors
CP decomposition of large third-order tensors can be computationally challenging. Parameters are typically estimated by means of the ALS procedure because it yields least-squares solutions and provides consistent outcomes. Nevertheless, ALS presents two major flaws which are particularly problematic for large-scale problems: slow convergence and sensitiveness to degeneracy conditions such as over-factoring, collinearity, bad initialization and local minima. More efficient algorithms have been proposed in the literature. They are, however, much less dependable than ALS in delivering stable results because the increased speed often comes at the expense of accuracy. In particular, the ATLD procedure is one of the fastest alternatives, but it is hardly employed because of the unreliable nature of its convergence. As a solution, multi-optimization is proposed. ATLD and ALS steps are concatenated in an integrated procedure with the purpose of increasing efficiency without a significant loss in precision. This methodology has been implemented and tested under realistic conditions on simulated data sets
A procedure for the three-way analysis of compositions
The Tucker3 model is one of the most widely used tools for factorial analysis of three-way data arrays. When orthogonal factors are extracted this model can be seen as a three-way PCA (principal component analysis). The Tucker3 model is characterized by extreme flexibility as it allows for the use of a different number of factors in each mode and it yields non-unique results. When this model is applied to vectors of non-negative values with a sum constraint all problems connected with the statistical analysis of compositions must be taken into consideration. Like other standard statistical techniques, this model cannot be directly applied. The aim of this paper is to present the theory behind the Tucker3 model on compositional data and to describe the TUCKALS3 algorithm
Detecting public social spending patterns in Italy using a three-way relative variation approach
Studies on public social spending often fail to address the issues connected with budgetary constraints. Budget lines require public entities to partition resources among sectors of spending on the basis of preferred combinations and trade-offs. Standard exploratory tools do not allow to unveil this preference structure as they are hindered by the differences in budget scales and by the bounded nature of sector variability, i.e. an increase in one sector means a missed increase or a decrease in other sectors. In this work Italian public social spending is modeled with an alternative log-ratio methodology which allows to study relative variation patterns among sectors. It is also important to note that since the data is collected across time a three-way approach is recommended so that the variability of each mode is kept separate
- …