2,667 research outputs found

    Sparse Linear Identifiable Multivariate Modeling

    Full text link
    In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component delta-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identifiability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identifiable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/.Comment: 45 pages, 17 figure

    Highly Interconnected Subsystems of the Stock Market

    Get PDF
    The stock market is a complex system that affects economic and financial activities around the world. Analysis of stock price data can improve our understanding of the past price movements of stocks. In this work, we develop a method to determine the highly interconnected subsystems of the stock market. Our method relies on a k-core decomposition scheme to analyze large networks. Our approach illustrates that the stock market is a nearly decomposable system which comprises hierarchic subsystems. This work also presents results from the analysis of a network derived from a large data set of stock prices. This network analysis technique is a new promising approach to analyze and classify stocks based on price interactions and to decompose the complex system embodied in the stock market

    Organizing the Technical Debt Landscape

    Get PDF
    To date, several methods and tools for detecting source code and design anomalies have been developed. While each method focuses on identifying certain classes of source code anomalies that potentially relate to technical debt (TD), the overlaps and gaps among these classes and TD have not been rigorously demonstrated. We propose to construct a seminal technical debt landscape as a way to visualize and organize research on the subjec

    Making open data work for plant scientists

    Get PDF
    Despite the clear demand for open data sharing, its implementation within plant science is still limited. This is, at least in part, because open data-sharing raises several unanswered questions and challenges to current research practices. In this commentary, some of the challenges encountered by plant researchers at the bench when generating, interpreting, and attempting to disseminate their data have been highlighted. The difficulties involved in sharing sequencing, transcriptomics, proteomics, and metabolomics data are reviewed. The benefits and drawbacks of three data-sharing venues currently available to plant scientists are identified and assessed: (i) journal publication; (ii) university repositories; and (iii) community and project-specific databases. It is concluded that community and project-specific databases are the most useful to researchers interested in effective data sharing, since these databases are explicitly created to meet the researchers’ needs, support extensive curation, and embody a heightened awareness of what it takes to make data reuseable by others. Such bottom-up and community-driven approaches need to be valued by the research community, supported by publishers, and provided with long-term sustainable support by funding bodies and government. At the same time, these databases need to be linked to generic databases where possible, in order to be discoverable to the majority of researchers and thus promote effective and efficient data sharing. As we look forward to a future that embraces open access to data and publications, it is essential that data policies, data curation, data integration, data infrastructure, and data funding are linked together so as to foster data access and research productivity

    SEED: a tool for disseminating systematic review data into Wikipedia

    Get PDF
    Wikipedia, the free-content online encyclopaedia, contains many heavily accessed pages relating to healthcare. Cochrane systematic reviews contain much high-grade evidence but dissemination into Wikipedia has been slow. New skills are needed to both translate and relocate data from Cochrane reviews to implant into Wikipedia pages. This letter introduces a programme to greatly simplify the process of disseminating the summary of findings of Cochrane reviews into Wikipedia pages
    corecore