358 research outputs found

    Two more things about compositional biplots: quality of projection and inclusion of supplementary elements

    Get PDF
    The biplot is a widely and powerful methodology used with multidimensional data sets to describe and display the relationships between observations and variables in an easy way. Compositional data consist of positive vectors each of which is constrained to have a constant sum; due to this property standard biplots can not be performed with compositional data, instead of a previous transformation of the data is performed. Due to this constant sum constraint, a transformation of data is needed before performing a biplot and, consequently, special interpretation rules are required. However, these rules can only be safely applied when the elements of a biplot have a good quality of projection, for which a new measure is introduced in this paper. Also, we extend the compositional biplot defined by Aitchison and Greenacre on 2002, in order to include the display supplementary elements that are not used in the definition of the compositional biplot. Different types of supplementary elements are considered: supplementary parts of the composition, supplementary continuous variables external to the composition, supplementary categorical variables and supplementary observations. The projection of supplementary parts of the composition is done by means of the equivalence of clr and lr biplots. The other supplementary projections are done by classical methodology. Both the qualities of projections and the supplementary projections are explained using real geological data: a sample of 72 observations of soil in an area about 20 km west of Kiev in the area south of Kiev Polessie

    Exploration of geochemical data with compositional canonical biplots

    Get PDF
    The study of the relationships between two compositions is of paramount importance in geochemical data analysis. This paper develops a compositional version of canonical correlation analysis, called CoDA-CCO, for this purpose. We consider two approaches, using the centred log-ratio transformation and the calculation of all possible pairwise log-ratios within sets. The relationships between both approaches are pointed out, and their merits are discussed. The related covariance matrices are structurally singular, and this is efficiently dealt with by using generalized inverses. We develop compositional canonical biplots and detail their properties. The canonical biplots are shown to be powerful tools for discovering the most salient relationships between two compositions. Some guidelines for compositional canonical biplots construction are discussed. A geochemical data set with X-ray fluorescence spectrometry measurements on major oxides and trace elements of European floodplains is used to illustrate the proposed method. The relationships between an analysis based on centred log-ratios and on isometric log-ratios are also shown.Peer ReviewedPostprint (author's final draft

    Characterization of Toxoplasma gondii subtelomeric-like regions: identification of a long-range compositional bias that is also associated with gene-poor regions

    Get PDF
    Background Chromosome ends are composed of telomeric repeats and subtelomeric regions, which are patchworks of genes interspersed with repeated elements. Although chromosome ends display similar arrangements in different species, their sequences are highly divergent. In addition, these regions display a particular nucleosomal composition and bind specific factors, therefore producing a special kind of heterochromatin. Using data from currently available draft genomes we have characterized these putative Telomeric Associated Sequences in Toxoplasma gondii. Results An all-vs-all pairwise comparison of T. gondii assembled chromosomes revealed the presence of conserved regions of ∼ 30 Kb located near the ends of 9 of the 14 chromosomes of the genome of the ME49 strain. Sequence similarity among these regions is ∼ 70%, and they are also highly conserved in the GT1 and VEG strains. However, they are unique to Toxoplasma with no detectable similarity in other Apicomplexan parasites. The internal structure of these sequences consists of 3 repetitive regions separated by high-complexity sequences without annotated genes, except for a gene from the Toxoplasma Specific Family. ChIP-qPCR experiments showed that nucleosomes associated to these sequences are enriched in histone H4 monomethylated at K20 (H4K20me1), and the histone variant H2A.X, suggesting that they are silenced sequences (heterochromatin). A detailed characterization of the base composition of these sequences, led us to identify a strong long-range compositional bias, which was similar to that observed in other genomic silenced fragments such as those containing centromeric sequences, and was negatively correlated to gene density. Conclusions We identified and characterized a region present in most Toxoplasma assembled chromosomes. Based on their location, sequence features, and nucleosomal markers we propose that these might be part of subtelomeric regions of T. gondii. The identified regions display a unique trinucleotide compositional bias, which is shared (despite the lack of any detectable sequence similarity) with other silenced sequences, such as those making up the chromosome centromeres. We also identified other genomic regions with this compositional bias (but no detectable sequence similarity) that might be functionally similar.Fil: Dalmasso, Maria Carolina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); ArgentinaFil: Carmona, Santiago Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); ArgentinaFil: Ángel, Sergio Oscar. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; ArgentinaFil: Agüero, Fernan Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentin

    On the interpretation of differences between groups for compositional data

    Get PDF
    Social polices are designed using information collected in surveys; such as the Catalan TimeUse survey. Accurate comparisons of time use data among population groups are commonlyanalysed using statistical methods. The total daily time expended on different activities by asingle person is equal to 24 hours. Because this type of data are compositional, its sample spacehas particular properties that statistical methods should respect. The critical points required tointerpret differences between groups are provided and described in terms of log-ratio methods.These techniques facilitate the interpretation of the relative differences detected in multivariateand univariate analysis

    A log-ratio biplot approach for exploring genetic relatedness based on identity by state

    Get PDF
    The detection of cryptic relatedness in large population-based cohorts is of great importance in genome research. The usual approach for detecting closely related individuals is to plot allele sharing statistics, based on identity-by-state or identity-by-descent, in a two-dimensional scatterplot. This approach ignores that allele sharing data across individuals has in reality a higher dimensionality, and neither regards the compositional nature of the underlying counts of shared genotypes. In this paper we develop biplot methodology based on log-ratio principal component analysis that overcomes these restrictions. This leads to entirely new graphics that are essentially useful for exploring relatedness in genetic databases from homogeneous populations. The proposed method can be applied in an iterative manner, acting as a looking glass for more remote relationships that are harder to classify. Datasets from the 1,000 Genomes Project and the Genomes For Life-GCAT Project are used to illustrate the proposed method. The discriminatory power of the log-ratio biplot approach is compared with the classical plots in a simulation study. In a non-inbred homogeneous population the classification rate of the log-ratio principal component approach outperforms the classical graphics across the whole allele frequency spectrum, using only identity by state. In these circumstances, simulations show that with 35,000 independent bi-allelic variants, log-ratio principal component analysis, combined with discriminant analysis, can correctly classify relationships up to and including the fourth degreePostprint (published version

    Analytical Concentrations of Some Elements in Seeds and Crude Extracts from Aesculus hippocastanum, by ICP-OES Technique

    Get PDF
    The metal content in some samples of horse chestnut seeds (Aesculus hippocastanum) was monitored over time (years 2016, 2017, 2018, 2019) considering the two most common and representative Mediterranean varieties: the pure species (AHP, which gives white flowers) and a hybrid one (AHH, which gives pink flowers). The selected elemental composition of the samples was determined by applying the ICP-OES technique. Several samples obtained from different preliminary treatments of the peeled seeds were examined, such as: i) floury samples (wild type) mineralized with the wet method; ii) the ashes of both AHP and AHH varieties; iii) the fraction of total inorganic soluble salts (TISS). Furthermore, the hydroalcoholic crude extracts (as a tincture) were obtained according to the official Pharmacopoeia methods, and the relevant results were compared with those of a commercial sample, an herbal product - food supplement of similar characteristics. The main characteristics of this research work underline that the two botanical varieties give different distinctive characters, due to the Fe content (80.05 vs 1.42 mg / 100 g d.s., for AHP and AHH - wild type flour samples, respectively), along with K, Ca, Mn, Ni and Cu, which are more abundant in the AHP samples. Furthermore, the PCA analysis was applied to the experimental dataset in order to classify and discriminate the samples, in relation to their similar botanical origin, but different for the color of the bloom. These results can be useful for the traceability of raw materials potentially intended for the production of auxiliary systems of pharmacological interest.The metal content in some samples of horse chestnut seeds (Aesculus hippocastanum) was monitored over time (years 2016–2019) considering the two most common and representative Mediterranean varieties: the pure species (AHP, which gives white flowers) and a hybrid one (AHH, which gives pink flowers). The selected elemental composition of the samples was determined by applying the Inductively Coupled Plasma-Optical Emission Spectroscopy (ICP-OES) technique. Several samples obtained from different preliminary treatments of the peeled seeds were examined, such as: (i) floury samples (wild-type) mineralized with the wet method; (ii) the ashes of both AHP and AHH varieties; (iii) the fraction of total inorganic soluble salts (TISS). Furthermore, the hydroalcoholic crude extracts (as a tincture) were obtained according to the official Pharmacopoeia methods, and the relevant results were compared with those of a commercial sample, an herbal product-food supplement of similar characteristics. The main characteristics of this research work underline that the two botanical varieties give different distinctive characters, due to the Fe content (80.05 vs. 1.42 mg/100 g d.s., for AHP and AHH wild-type flour samples, respectively), along with K, Ca, Mn, Ni and Cu, which are more abundant in the AHP samples. Furthermore, the Principal Component Analysis (PCA) was applied to the experimental dataset in order to classify and discriminate the samples, in relation to their similar botanical origin, but different for the color of the bloom. These results can be useful for the traceability of raw materials potentially intended for the production of auxiliary systems of pharmacological interest

    Phoenician Pottery in the Western Mediterranean: A New Perspective Based on the Early Iron Age (800-550 BC) Settlement of Sant Jaume (Alcanar, Catalonia)

    Get PDF
    One of the most important reception sites for Phoenician pottery imports in the NE Iberian Peninsula is the Early Iron Age (800-550 BC) settlement of Sant Jaume. This site is exceptional in terms of preservation and the large number of complete vessels recovered. Moreover, the ceramic assemblage comprises one of the best collections of the earliest wheel-thrown pottery that is considered evidence of trade from the western Phoenician colonies and their specific interest in exploiting metallurgical resources. In this research, a sample of 58 individuals of wheel-thrown pottery has been analysed by X-ray fluorescence (XRF), X-ray diffraction (XRD), petrography (PE), and scanning electron microscopy attached with an energy dispersive X-ray unit (SEM-EDX). It was possible to identify 29 ceramic groups, some of which correspond to known Phoenician workshops of southern Andalusia and Ibiza, though the origin of most groups remains to be determined. The wide variety of sources identified illuminates the patterns of trade and exchange that the Phoenicians developed during the Early Iron Age and the export of their manufactured products. This information is fundamental to our understanding of the economic system developed by the Western Mediterranean Phoenician colonies that affected and transformed indigenous communities in the Mediterranean region

    Gypsum-exclusive plants accumulate more leaf S than non-exclusive species both in and off gypsum

    Get PDF
    Gypsum-exclusive species (gypsophiles), are restricted to gypseous soils in natural environments. However, it is unclear why gypsophiles display greater affinity to gyspeous soils than other soils. These plants are edaphic endemics, growing in alkaline soils with high Ca and S. Gypsophiles tend to show higher foliar Ca and S, lower K and, sometimes, higher Mg than non-exclusive gypsum species, named gypsovags. Our aim was to test if the unique leaf elemental signature of gypsophiles could be the result of special nutritional requirements linked to their specificity to gypseous soils. These nutritional requirements could hamper the completion of their life cycle and growth in other soil types. To test this hypothesis, we cultivated five gypsophiles and five gypsovags dominant in Spanish gypsum outcrops on gypseous and calcareous (non-gypseous) field soil for 29 months. We regularly measured growth and phenology, and differences in leaf traits, final biomass, individual seed mass, seed viability, photosynthetic assimilation and leaf elemental composition. We found all the gypsophiles studied were able to complete their life cycle in non-gypseous soil, producing viable seeds, attaining greater biomass and displaying higher photosynthetic assimilation rates than in gypseous soil. The leaf elemental composition of some species (both gypsophiles and gypsovags) shifted depending on soil, although none of them showed leaf deficiency symptoms. Regardless of soil type, gypsophiles had higher leaf S, Mg, Fe, Al, Na, Mn, Cr and lower K than gypsovags. Consequently, gypsophiles have a unique leaf chemical signature compared to gypsovags of the same family, particularly due to their high leaf S regardless of soil conditions. However, these nutrient requirements are not sufficient to explain why gypsophiles are restricted to gypsum soil in natural conditions

    A Log-Ratio Biplot Approach for Exploring Genetic Relatedness Based on Identity by State

    Get PDF
    The detection of cryptic relatedness in large population-based cohorts is of great importance in genome research. The usual approach for detecting closely related individuals is to plot allele sharing statistics, based on identity-by-state or identity-by-descent, in a two-dimensional scatterplot. This approach ignores that allele sharing data across individuals has in reality a higher dimensionality, and neither regards the compositional nature of the underlying counts of shared genotypes. In this paper we develop biplot methodology based on log-ratio principal component analysis that overcomes these restrictions. This leads to entirely new graphics that are essentially useful for exploring relatedness in genetic databases from homogeneous populations. The proposed method can be applied in an iterative manner, acting as a looking glass for more remote relationships that are harder to classify. Datasets from the 1,000 Genomes Project and the Genomes For Life-GCAT Project are used to illustrate the proposed method. The discriminatory power of the log-ratio biplot approach is compared with the classical plots in a simulation study. In a non-inbred homogeneous population the classification rate of the log-ratio principal component approach outperforms the classical graphics across the whole allele frequency spectrum, using only identity by state. In these circumstances, simulations show that with 35,000 independent bi-allelic variants, log-ratio principal component analysis, combined with discriminant analysis, can correctly classify relationships up to and including the fourth degree
    corecore