680 research outputs found

    Distributed Bayesian Probabilistic Matrix Factorization

    Full text link
    Matrix factorization is a common machine learning technique for recommender systems. Despite its high prediction accuracy, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of its high computational cost. In this paper we propose a distributed high-performance parallel implementation of BPMF on shared memory and distributed architectures. We show by using efficient load balancing using work stealing on a single node, and by using asynchronous communication in the distributed version we beat state of the art implementations

    TaskPoint: sampled simulation of task-based programs

    Get PDF
    Sampled simulation is a mature technique for reducing simulation time of single-threaded programs, but it is not directly applicable to simulation of multi-threaded architectures. Recent multi-threaded sampling techniques assume that the workload assigned to each thread does not change across multiple executions of a program. This assumption does not hold for dynamically scheduled task-based programming models. Task-based programming models allow the programmer to specify program segments as tasks which are instantiated many times and scheduled dynamically to available threads. Due to system noise and variation in scheduling decisions, two consecutive executions on the same machine typically result in different instruction streams processed by each thread. In this paper, we propose TaskPoint, a sampled simulation technique for dynamically scheduled task-based programs. We leverage task instances as sampling units and simulate only a fraction of all task instances in detail. Between detailed simulation intervals we employ a novel fast-forward mechanism for dynamically scheduled programs. We evaluate the proposed technique on a set of 19 task-based parallel benchmarks and two different architectures. Compared to detailed simulation, TaskPoint accelerates architectural simulation with 64 simulated threads by an average factor of 19.1 at an average error of 1.8% and a maximum error of 15.0%.This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence and the Mont-Blanc project (EU-FP7-610402 and EU-H2020-671697). M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas is supported by the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the EUFP7 (contract 2013BP B 00243). T.Grass has been partially supported by the AGAUR of the Generalitat de Catalunya (grant 2013FI B 0058).Peer ReviewedPostprint (author's final draft

    OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

    Full text link
    The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

    Exact Methods for Multi-echelon Inventory Control : Incorporating Shipment Decisions and Detailed Demand Information

    Get PDF
    Recent advances in information technologies and an increased environmental awareness have altered the prerequisites for successful logistics. For companies operating on a global market, inventory control of distribution systems is often an essential part of their logistics planning. In this context, the research objective of this thesis is: To develop exact methods for stochastic inventory control of multi-echelon distribution systems incorporating shipment decisions and/or detailed demand information.The thesis consists of five scientific papers (Paper I, II, III, IV and V) preceded by a summarizing introduction. All papers study systems with a central warehouse supplying a number of non-identical local warehouses (retailers) facing stochastic demand. For given replenishment policies, the papers provide exact expressions for evaluating the expected long-run system behavior (e.g., distributions of backorders, inventory levels, shipment sizes and expected costs) and present optimization procedures for the control variables. Paper I and II consider systems where shipments from the central warehouse are consolidated to groups of retailers and dispatched periodically. By doing so, economies of scale for the transports can be reached, reducing both transportation costs and emissions. Paper I assumes Poisson customer demand and considers volume-dependent transportation costs and emissions. The model involves the possibility to reserve intermodal (train) capacity in combination with truck transports available on demand. For this system, the expected inventory costs, the expected transportation costs and the expected transport emissions are determined. Joint optimization procedures for the shipment intervals, the capacity reservation quantities, the reorder points and order-up-to levels in the system are provided, with or without emission considerations. Paper II analyses the expected costs of the same system for compound Poisson demand (where customer demand sizes may vary), but with only one transportation mode and fixed transportation costs per shipment. It also shows how to handle fill rate constraints. Paper III studies a system where all stock points use installation stock (R,Q) ordering policies (batch ordering). This implies that situations can occur when only part of a requested retailer order is available at the central warehouse. In these situations, the models in existing literature predominantly assume that available units are shipped immediately (partial delivery). An alternative is to wait until the entire order is available before dispatching (complete delivery). The paper introduces a cost for splitting the order and evaluates a system where optimal choices between partial and complete deliveries are made for all orders. In a numerical study it is shown that significant savings can be made by using this policy compared to systems which exclusively use either partial or complete deliveries. Paper IV shows how companies can benefit from detailed information about their customer demand. In a continuous review base stock system, the customer demand is modeled with independent compound renewal processes at the retailers. This means that the customer inter-arrival times may follow any continuous distribution and the demand sizes may follow any discrete distribution. A numerical study shows that this model can achieve substantial savings compared to models using the common assumption of exponential customer inter-arrival times. Paper V is a short technical note that extends the scope of analysis for several existing stochastic multi-echelon inventory models. These models analyze the expected costs without first determining the inventory level distribution. By showing how these distributions can be obtained from the expected cost functions, this note facilitates the analysis of several service measures, including the ready rate and the fill rate

    On the Load Balancing Techniques for GPU Applications Based on Prefix-scan

    Get PDF
    Prefix-scan is one of the most common operation and building block for a wide range of parallel applications for GPUs. It allows the GPU threads to efficiently find and access in parallel to the assigned data. Nevertheless, the workload decomposition and mapping strategies that make use of prefix-scan can have a significant impact on the overall application performance. This paper presents a classification of the mapping strategies at the state of the art and their comparison to understand in which problem they best apply. Then, it presents Multi-Phase Search, an advanced dynamic technique that addresses the workload unbalancing problem by fully exploiting the GPU device characteristics. In particular, the proposed technique implements a dynamic mapping of work-units to threads through an algorithm whose complexity is sensibly reduced with respect to the other dynamic approaches in the literature. The paper shows, compares, and analyses the experimental results obtained by applying all the mapping techniques to different datasets, each one having very different characteristics and structure

    Development of an Optimal Replenishment Policy for Human Capital Inventory

    Get PDF
    A unique approach is developed for evaluating Human Capital (workforce) requirements. With this approach, new ways of measuring personnel availability are proposed and available to ensure that an organization remains ready to provide timely, relevant, and accurate products and services in support of its strategic objectives over its planning horizon. The development of this analysis and methodology was established as an alternative approach to existing studies for determining appropriate hiring and attrition rates and to maintain appropriate personnel levels of effectiveness to support existing and future missions. The contribution of this research is a prescribed method for the strategic analyst to incorporate a personnel and cost simulation model within the framework of Human Resources Human Capital forecasting which can be used to project personnel requirements and evaluate workforce sustainment, at least cost, through time. This will allow various personnel managers to evaluate multiple resource strategies, present and future, maintaining near “perfect” hiring and attrition policies to support its future Human Capital assets
    • …
    corecore