7,248 research outputs found

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    The Vehicle Routing Problem with Service Level Constraints

    Full text link
    We consider a vehicle routing problem which seeks to minimize cost subject to service level constraints on several groups of deliveries. This problem captures some essential challenges faced by a logistics provider which operates transportation services for a limited number of partners and should respect contractual obligations on service levels. The problem also generalizes several important classes of vehicle routing problems with profits. To solve it, we propose a compact mathematical formulation, a branch-and-price algorithm, and a hybrid genetic algorithm with population management, which relies on problem-tailored solution representation, crossover and local search operators, as well as an adaptive penalization mechanism establishing a good balance between service levels and costs. Our computational experiments show that the proposed heuristic returns very high-quality solutions for this difficult problem, matches all optimal solutions found for small and medium-scale benchmark instances, and improves upon existing algorithms for two important special cases: the vehicle routing problem with private fleet and common carrier, and the capacitated profitable tour problem. The branch-and-price algorithm also produces new optimal solutions for all three problems

    Reformulation and decomposition of integer programs

    Get PDF
    In this survey we examine ways to reformulate integer and mixed integer programs. Typically, but not exclusively, one reformulates so as to obtain stronger linear programming relaxations, and hence better bounds for use in a branch-and-bound based algorithm. First we cover in detail reformulations based on decomposition, such as Lagrangean relaxation, Dantzig-Wolfe column generation and the resulting branch-and-price algorithms. This is followed by an examination of Benders’ type algorithms based on projection. Finally we discuss in detail extended formulations involving additional variables that are based on problem structure. These can often be used to provide strengthened a priori formulations. Reformulations obtained by adding cutting planes in the original variables are not treated here.Integer program, Lagrangean relaxation, column generation, branch-and-price, extended formulation, Benders' algorithm

    Designing the Liver Allocation Hierarchy: Incorporating Equity and Uncertainty

    Get PDF
    Liver transplantation is the only available therapy for any acute or chronic condition resulting in irreversible liver dysfunction. The liver allocation system in the U.S. is administered by the United Network for Organ Sharing (UNOS), a scientific and educational nonprofit organization. The main components of the organ procurement and transplant network are Organ Procurement Organizations (OPOs), which are collections of transplant centers responsible for maintaining local waiting lists, harvesting donated organs and carrying out transplants. Currently in the U.S., OPOs are grouped into 11 regions to facilitate organ allocation, and a three-tier mechanism is utilized that aims to reduce organ preservation time and transport distance to maintain organ quality, while giving sicker patients higher priority. Livers are scarce and perishable resources that rapidly lose viability, which makes their transport distance a crucial factor in transplant outcomes. When a liver becomes available, it is matched with patients on the waiting list according to a complex mechanism that gives priority to patients within the harvesting OPO and region. Transplants at the regional level accounted for more than 50% of all transplants since 2000.This dissertation focuses on the design of regions for liver allocation hierarchy, and includes optimization models that incorporate geographic equity as well as uncertainty throughout the analysis. We employ multi-objective optimization algorithms that involve solving parametric integer programs to balance two possibly conflicting objectives in the system: maximizing efficiency, as measured by the number of viability adjusted transplants, and maximizing geographic equity, as measured by the minimum rate of organ flow into individual OPOs from outside of their own local area. Our results show that efficiency improvements of up to 6% or equity gains of about 70% can be achieved when compared to the current performance of the system by redesigning the regional configuration for the national liver allocation hierarchy.We also introduce a stochastic programming framework to capture the uncertainty of the system by considering scenarios that correspond to different snapshots of the national waiting list and maximize the expected benefit from liver transplants under this stochastic view of the system. We explore many algorithmic and computational strategies including sampling methods, column generation strategies, branching and integer-solution generation procedures, to aid the solution process of the resulting large-scale integer programs. We also explore an OPO-based extension to our two-stage stochastic programming framework that lends itself to more extensive computational testing. The regional configurations obtained using these models are estimated to increase expected life-time gained per transplant operation by up to 7% when compared to the current system.This dissertation also focuses on the general question of designing efficient algorithms that combine column and cut generation to solve large-scale two-stage stochastic linear programs. We introduce a flexible method to combine column generation and the L-shaped method for two-stage stochastic linear programming. We explore the performance of various algorithm designs that employ stabilization subroutines for strengthening both column and cut generation to effectively avoid degeneracy. We study two-stage stochastic versions of the cutting stock and multi-commodity network flow problems to analyze the performances of algorithms in this context

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    Optimizing the Efficiency of the United States Organ Allocation System through Region Reorganization

    Get PDF
    Allocating organs for transplantation has been controversial in the United States for decades. Two main allocation approaches developed in the past are (1) to allocate organs to patients with higher priority at the same locale; (2) to allocate organs to patients with the greatest medical need regardless of their locations. To balance these two allocation preferences, the U.S. organ transplantation and allocation network has lately implemented a three-tier hierarchical allocation system, dividing the U.S. into 11 regions, composed of 59 Organ Procurement Organizations (OPOs). At present, an procured organ is offered first at the local level, and then regionally and nationally. The purpose of allocating organs at the regional level is to increase the likelihood that a donor-recipient match exists, compared to the former allocation approach, and to increase the quality of the match, compared to the latter approach. However, the question of which regional configuration is the most efficient remains unanswered. This dissertation develops several integer programming models to find the most efficient set of regions. Unlike previous efforts, our model addresses efficient region design for the entire hierarchical system given the existing allocation policy. To measure allocation efficiency, we use the intra-regional transplant cardinality. Two estimates are developed in this dissertation. One is a population-based estimate; the other is an estimate based on the situation where there is only one waiting list nationwide. The latter estimate is a refinement of the former one in that it captures the effect of national-level allocation and heterogeneity of clinical and demographic characteristics among donors and patients. To model national-level allocation, we apply a modeling technique similar to spill-and-recapture in the airline fleet assignment problem. A clinically based simulation model is used in this dissertation to estimate several necessary parameters in the analytic model and to verify the optimal regional configuration obtained from the analytic model. The resulting optimal region design problem is a large-scale set-partitioning problem in whichthere are too many columns to handle explicitly. Given this challenge, we adapt branch and price in this dissertation. We develop a mixed-integer programming pricing problem that is both theoretically and practically hard to solve. To alleviate this existing computational difficulty, we apply geographic decomposition to solve many smaller-scale pricing problems based on pre-specified subsets of OPOs instead of a big pricing problem. When solving each smaller-scale pricing problem, we also generate multiple ``promising' regions that are not necessarily optimal to the pricing problem. In addition, we attempt to develop more efficient solutions for the pricing problem by studying alternative formulations and developing strong valid inequalities. The computational studies in this dissertation use clinical data and show that (1) regional reorganization is beneficial; (2) our branch-and-price application is effective in solving the optimal region design problem

    A branch-and-price algorithm for a hierarchical crew scheduling problem.

    Get PDF
    We describe a real-life problem arising at a crane rental company. This problem is a generalization of the basic crew scheduling problem given in Mingozzi et al. and Beasley and Cao. We formulate the problem as an integer programming problem and establish ties with the integer multicommodity flow problem and the hierarchical interval scheduling problem. After establishing the complexity of the problem we propose a branch-and-price algorithm to solve it. We test this algorithm on a limited number of real-life instances.Scheduling;
    corecore