39 research outputs found

    A structured overview of simultaneous component based data integration

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.</p> <p>Results</p> <p>We offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for <it>Escherichia coli </it>as obtained with different analytical chemical measurement methods.</p> <p>Conclusion</p> <p>Of the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.</p

    Community Compensatory Trend Prevails from Tropical to Temperate Forest

    Get PDF
    Community compensatory trend (CCT) is thought to facilitate persistence of rare species and thus stabilize species composition in tropical forests. However, whether CCT acts over broad geographical ranges is still in question. In this study, we tested for the presence of negative density dependence (NDD) and CCT in three forests along a tropical-temperate gradient. Inventory data were collected from forest communities located in three different latitudinal zones in China. Two widely used methods were used to test for NDD at the community level. The first method considered relationships between the relative abundance ratio and adult abundance. The second method emphasized the effect of adult abundance on abundance of established younger trees. Evidence for NDD acting on different growth forms was tested by using the first method, and the presence of CCT was tested by checking whether adult abundance of rare species affected that of established younger trees less than did abundance of common species. Both analyses indicated that NDD existed in seedling, sapling and pole stages in all three plant communities and that this effect increased with latitude. However, the extent of NDD varied among understory, midstory and canopy trees in the three communities along the gradient. Additionally, despite evidence of NDD for almost all common species, only a portion of rare species showed NDD, supporting the action of CCT in all three communities. So, we conclude that NDD and CCT prevail in the three recruitment stages of the tree communities studied; rare species achieve relative advantage through CCT and thus persist in these communities; CCT clearly facilitates newly established species and maintains tree diversity within communities across our latitudinal gradient

    Bootstrap confidence intervals in multi-level simultaneous component analysis

    No full text
    Multi-level simultaneous component analysis (MLSCA) was designed for the exploratory analysis of hierarchically ordered data. MLSCA specifies a component model for each level in the data, where appropriate constraints express possible similarities between groups of objects at a certain level, yielding four MLSCA variants. The present paper discusses different bootstrap strategies for estimating confidence intervals (CIs) on the individual parameters. In selecting a proper strategy, the main issues to address are the resampling scheme and the non-uniqueness of the parameters. The resampling scheme depends on which level(s) in the hierarchy are considered random, and which fixed. The degree of non-uniqueness depends on the MLSCA variant, and, in two variants, the extent to which the user exploits the transformational freedom. A comparative simulation study examines the quality of bootstrap CIs of different MLSCA parameters. Generally, the quality of bootstrap CIs appears to be good, provided the sample sizes are sufficient at each level that is considered to be random. The latter implies that if more than a single level is considered random, the total number of observations necessary to obtain reliable inferential information increases dramatically. An empirical example illustrates the use of bootstrap CIs in MLSCA.status: publishe

    Nonnegative coupled matrix tensor factorization for smart city spatiotemporal pattern mining

    No full text
    With the advancements in smartphones and inbuilt sensors, the day-to-day spatiotemporal activities of people can be recorded. With this available information, the automated extraction of spatiotemporal patterns is crucial to un-derstand the people’s mobility. These patterns can assist in improving the smart city environments like traffic control, urban planning, and transportation facili-ties. The smartphone generated spatiotemporal data is enriched with multiple contexts and efficiently utilizing them in a Machine Learning process is still a challenging task. In this paper, we propose a Nonnegative Coupled Matrix Tensor Factorization (CMTF) model to integrate and analyze additional contexts with spatiotemporal data to generate meaningful patterns. We also propose an efficient factorization algorithm based on variable selection to solve the Nonnegative CMTF model that yields accurate spatiotemporal patterns. Our empirical analysis highlights the efficiency of the proposed CMTF model in terms of accuracy and factor goodness

    Approaches to Fault Detection for Heating Systems Using CP Tensor Decompositions

    No full text
    Two new signal-based and one model-based fault detection methods using canonical polyadic (CP) tensor decomposition algorithms are presented, and application examples of heating systems are given for all methods. The first signal-based fault detection method uses the factor matrices of a data tensor directly, the second calculates expected values from the decomposed tensor and compares these with measured values to generate the residuals. The third fault detection method is based on multi-linear models represented by parameter tensors with elements computed by subspace parameter identification algorithms and data for different but structured operating regimes. In case of missing data or model parameters in tensor representation, an approximation method based on a special CP tensor decomposition algorithm for incomplete tensors is proposed, called the decompose-and-unfold method. As long as all relevant dynamics has been recorded, this method approximates – also from incomplete data – models for all operating regimes, which can be used for residual generation and fault detection, e.g. by parity equations

    Exploratory Approaches to Seriation by Asymmetric Multidimensional Scaling

    No full text
    Seriation and multidimensional scaling are two techniques aimed at exploring relationships in dominance or proximity data matrices. Rodgers & Thompson (1992) argued that the two approaches can profitably interact in the analysis of asymmetric proximity matrices, they proposed a method that uses seriation to define an empirical ordering of the objects, and symmetric multidimensional scaling to scale the two separate triangles of the proximity matrix defined by this ordering (an approach anticipated by method 3 in Gower (1977)). Following a similar idea, in this paper some procedures are proposed to explore seriation by asymmetric multidimensional scaling. The paper focalizes on skew-symmetric components of a particular class of asymmetric matrices (including e.g. tournament or paired comparison matrices). Two small examples of application are provided to illustrate the procedures respectively in the dichotomous and quantitative case
    corecore