4,258 research outputs found

    A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms

    Full text link
    This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better performances than the sequential algorithm. Another distributed scheme is therefore introduced which obtains the expected speed-ups. Then, it is improved to fit implementation on distributed architectures where communications are slow and inter-machines synchronization too costly. The schemes are tested with simulated distributed architectures and, for the last one, with Microsoft Windows Azure platform obtaining speed-ups up to 32 Virtual Machines

    Using problem frames with distributed architectures: a case for cardinality on interfaces

    Get PDF
    Certain classes of problems amenable to description using Problem Frames, in particular ones intended to be implemented using a distributed architecture, can benefit by the addition of a cardinality specification on the domain interfaces. This paper presents an example of such a problem, demonstrates the need for relationship cardinality, and proposes a notation to represent cardinality on domain interfaces

    Distributed data cache designs for clustered VLIW processors

    Get PDF
    Wire delays are a major concern for current and forthcoming processors. One approach to deal with this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the L1 data cache typically remains centralized in What we call partially distributed architectures. However, as technology evolves, the relative latency of such a centralized cache will increase, leading to an important impact on performance. In this paper, we propose partitioning the L1 data cache among clusters for clustered VLIW processors. We refer to this kind of design as fully distributed processors. In particular; we propose and evaluate three different configurations: a snoop-based cache coherence scheme, a word-interleaved cache, and flexible LO-buffers managed by the compiler. For each alternative, instruction scheduling techniques targeted to cyclic code are developed. Results for the Mediabench suite'show that the performance of such fully distributed architectures is always better than the performance of a partially distributed one with the same amount of resources. In addition, the key aspects of each fully distributed configuration are explored.Peer ReviewedPostprint (published version

    Swarm shape manipulation through connection control

    Get PDF
    The control of a large swarm of distributed agents is a well known challenge within the study of unmanned autonomous systems. However, it also presents many new opportunities. The advantages of operating a swarm through distributed means has been assessed in the literature for efficiency from both operational and economical aspects; practically as the number of agents increases, distributed control is favoured over centralised control, as it can reduce agent computational costs and increase robustness on the swarm. Distributed architectures, however, can present the drawback of requiring knowledge of the whole swarm state, therefore limiting the scalability of the swarm. In this paper a strategy is presented to address the challenges of distributed architectures, changing the way in which the swarm shape is controlled and providing a step towards verifiable swarm behaviour, achieving new configurations, while saving communication and computation resources. Instead of applying change at agent level (e.g. modify its guidance law), the sensing of the agents is addressed to a portion of agents, differentially driving their behaviour. This strategy is applied for swarms controlled by artificial potential functions which would ordinarily require global knowledge and all-to-all interactions. Limiting the agents' knowledge is proposed for the first time in this work as a methodology rather than obstacle to obtain desired swarm behaviour

    Distributed Bayesian Probabilistic Matrix Factorization

    Full text link
    Matrix factorization is a common machine learning technique for recommender systems. Despite its high prediction accuracy, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of its high computational cost. In this paper we propose a distributed high-performance parallel implementation of BPMF on shared memory and distributed architectures. We show by using efficient load balancing using work stealing on a single node, and by using asynchronous communication in the distributed version we beat state of the art implementations

    Membrane Dissolution in Distributed Architectures of P-Systems

    Get PDF
    The goal of this paper is twofold. Firstly, to survey in a systematic and uniform way the main results regarding the way membranes can be placed on processors in order to get a software/hardware simulation of P-Systems in a distributed environment. Secondly, we improve some results about the membrane dissolution problem, prove that it is connected, and discuss the possibility of simulating this property in the distributed model. All this yields an improvement in the system parallelism implementation since it gets an increment of the parallelism of the external communication among processors. Also, the number of processors grows in such a way that is notorious the increment of the parallelism in the application of the evolution rules and the internal communica-tionsstudy because it gets an increment of the parallelism in the application of the evolution rules and the internal communications. Proposed ideas improve previous architectures to tackle the communication bottleneck problem, such as reduction of the total time of an evolution step, increase of the number of membranes that could run on a processor and reduction of the number of processor
    corecore