2,793 research outputs found

    MLI: An API for Distributed Machine Learning

    Full text link
    MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of high-performance, scalable, distributed algorithms. Our initial results show that, relative to existing systems, this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity and highly competitive performance and scalability

    Decorrelation of Neutral Vector Variables: Theory and Applications

    Full text link
    In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate Gaussian distributed, the conventional principal component analysis (PCA) cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations

    LBGS: a smart approach for very large data sets vector quantization

    Get PDF
    Abstract In this paper, LBGS, a new parallel/distributed technique for Vector Quantization is presented. It derives from the well known LBG algorithm and has been designed for very complex problems where both large data sets and large codebooks are involved. Several heuristics have been introduced to make it suitable for implementation on parallel/distributed hardware. These lead to a slight deterioration of the quantization error with respect to the serial version but a large improvement in computing efficiency
    • …
    corecore