6,029 research outputs found

    Mining optimal item packages using mixed integer programming

    Get PDF
    Traditional methods for discovering frequent patterns from large databases are based on attributing equal weights to all items of the database. In the real world, managerial decisions are based on economic values attached to the item sets. In this paper, we introduce the concept of the value based frequent item packages problems. Furthermore, we provide a mixed integer linear programming (MILP) model for value based optimization problem in the context of transaction data. The problem discussed in this paper is to find an optimal set of item packages (or item sets making up the whole transaction) that returns maximum profit to the organization under some limited resources. The specification of this problem opens the way for applying existing and new MILP solution techniques to deal with a number of practical decision problems. The model has been implemented and tested with real life retail data. The test results are reported in the paper

    Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods

    Full text link
    Programming languages and platforms improve over time, sometimes resulting in new language features that offer many benefits. However, despite these benefits, developers may not always be willing to adopt them in their projects for various reasons. In this paper, we describe an empirical study where we assess the adoption of a particular new language feature. Studying how developers use (or do not use) new language features is important in programming language research and engineering because it gives designers insight into the usability of the language to create meaning programs in that language. This knowledge, in turn, can drive future innovations in the area. Here, we explore Java 8 default methods, which allow interfaces to contain (instance) method implementations. Default methods can ease interface evolution, make certain ubiquitous design patterns redundant, and improve both modularity and maintainability. A focus of this work is to discover, through a scientific approach and a novel technique, situations where developers found these constructs useful and where they did not, and the reasons for each. Although several studies center around assessing new language features, to the best of our knowledge, this kind of construct has not been previously considered. Despite their benefits, we found that developers did not adopt default methods in all situations. Our study consisted of submitting pull requests introducing the language feature to 19 real-world, open source Java projects without altering original program semantics. This novel assessment technique is proactive in that the adoption was driven by an automatic refactoring approach rather than waiting for developers to discover and integrate the feature themselves. In this way, we set forth best practices and patterns of using the language feature effectively earlier rather than later and are able to possibly guide (near) future language evolution. We foresee this technique to be useful in assessing other new language features, design patterns, and other programming idioms

    An Introduction to Programming for Bioscientists: A Python-based Primer

    Full text link
    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

    Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded

    Get PDF
    Decision trees usefully represent sparse, high dimensional and noisy data. Having learned a function from this data, we may want to thereafter integrate the function into a larger decision-making problem, e.g., for picking the best chemical process catalyst. We study a large-scale, industrially-relevant mixed-integer nonlinear nonconvex optimization problem involving both gradient-boosted trees and penalty functions mitigating risk. This mixed-integer optimization problem with convex penalty terms broadly applies to optimizing pre-trained regression tree models. Decision makers may wish to optimize discrete models to repurpose legacy predictive models, or they may wish to optimize a discrete model that particularly well-represents a data set. We develop several heuristic methods to find feasible solutions, and an exact, branch-and-bound algorithm leveraging structural properties of the gradient-boosted trees and penalty functions. We computationally test our methods on concrete mixture design instance and a chemical catalysis industrial instance

    Online fulfillment: f-warehouse order consolidation and bops store picking problems

    Get PDF
    Fulfillment of online retail orders is a critical challenge for retailers since the legacy infrastructure and control methods are ill suited for online retail. The primary performance goal of online fulfillment is speed or fast fulfillment, requiring received orders to be shipped or ready for pickup within a few hours. Several novel numerical problems characterize fast fulfillment operations and this research solves two such problems. Order fulfillment warehouses (F-Warehouses) are a critical component of the physical internet behind online retail supply chains. Two key distinguishing features of an F-Warehouse are (i) Explosive Storage Policy – A unique item can be stored simultaneously in multiple bin locations dispersed through the warehouse, and (ii) Commingled Bins – A bin can stock several different items simultaneously. The inventory dispersion profile of an item is therefore temporal and non-repetitive. The order arrival process is continuous, and each order consists of one or more items. From the set of pending orders, efficient picking lists of 10-15 items are generated. A picklist of items is collected in a tote, which is then transported to a packaging station, where items belonging to the same order are consolidated into a shipment package. There are multiple such stations. This research formulates and solves the order consolidation problem. At any time, a batch of totes are to be processed through several available order packaging stations. Tote assignment to a station will determine whether an order will be shipped in a single package or multiple packages. Reduced shipping costs are a key operational goal of an online retailer, and the number of packages is a determining factor. The decision variable is which station a tote should be assigned to, and the performance objective is to minimize the number of packages and balance the packaging station workload. This research first formulates the order consolidation problem as a mixed integer programming model, and then develops two fast heuristics (#1 and #2) plus two clustering algorithm derived solutions. For small problems, the heuristic #2 is on average within 4.1% of the optimal solution. For larger problems heuristic #2 outperforms all other algorithms. Performance behavior of heuristic #2 is further studied as a function of several characteristics. S-Strategy fulfillment is a store-based solution for fulfilling online customer orders. The S-Strategy is driven by two key motivations, first, retailers have a network of stores where the inventory is already dispersed, and second, the expectation is that forward positioned inventory could be faster and more economical than a warehouse based F-Strategy. Orders are picked from store inventory and then the customer picks up from the store (BOPS). A BOPS store has two distinguishing features (i) In addition to shelf stock, the layout includes a space constrained back stock of selected items, and (ii) a set of dedicated pickers who are scheduled to fulfill orders. This research solves two BOFS related problems: (i) Back stock strategy: Assignment of items located in the back stock and (ii) Picker scheduling: Effect of numbers of picker and work hours. A continuous flow of incoming orders is assumed for both problems and the objective is fulfillment time and labor cost minimization. For the back-stock problem an assignment rule based on order frequency, forward location and order basket correlations achieves a 17.6% improvement over a no back-stock store, while a rule based only on order frequency achieves a 12.4 % improvement. Additional experiments across a range of order baskets are reported

    On the design of R-based scalable frameworks for data science applications

    Get PDF
    This thesis is comprised of three papers "On the design of R-based scalable frameworks for data science applications". We discuss the design of conceptual and computational frameworks for the R language for statistical computing and graphics and build software artifacts for two typical data science use cases: optimization problem solving and large scale text analysis. Each part follows a design science approach. We use a verification method for the software frameworks introduced, i.e., prototypical instantiations of the designed artifacts are evaluated on the basis of real-world applications in mixed integer optimization (consensus journal ranking) and text mining (culturomics). The first paper introduces an extensible object oriented R Optimization Infrastructure (ROI). Methods from the field of optimization play an important role in many techniques routinely used in statistics, machine learning and data science. Often, implementations of these methods rely on highly specialized optimization algorithms, designed to be only applicable within a specific application. However, in many instances recent advances, in particular in the field of convex optimization, make it possible to conveniently and straightforwardly use modern solvers instead with the advantage of enabling broader usage scenarios and thus promoting reusability. With ROI one can formulate and solve optimization problems in a consistent way. It is capable of modeling linear, quadratic, conic, and general nonlinear optimization problems. Furthermore, the paper discusses how extension packages can add additional optimization solvers, read/write functions and additional resources such as model collections. Selected examples from the field of statistics conclude the paper. With the second paper we aim to answer two questions. Firstly, it addresses the issue on how to construct suitable aggregates of individual journal rankings, using an optimization-based consensus ranking approach. Secondly, the presented application serves as an evaluation of the ROI prototype. Regarding the first research question we apply the proposed method to a subset of marketing-related journals from a list of collected journal rankings. Next, the paper studies the stability of the derived consensus solution, and degeneration effects that occur when excluding journals and/or rankings. Finally, we investigate the similarities/dissimilarities of the consensus with a naive meta-ranking and with individual rankings. The results show that, even though journals are not uniformly ranked, one may derive a consensus ranking with considerably high agreement with the individual rankings. In the third paper we examine how we can extend the text mining package tm to handle large (text) corpora. This enables statisticians to answer many interesting research questions via statistical analysis or modeling of data sets that cannot be analyzed easily otherwise, e.g., due to software or hardware induced data size limitations. Adequate programming models like MapReduce facilitate parallelization of text mining tasks and allow for processing large data sets by using a distributed file system possibly spanning over several machines, e.g., in a cluster of workstations. The paper presents a plug-in package to tm called tm.plugin.dc implementing a distributed corpus class which can take advantage of the Hadoop MapReduce library for large scale text mining tasks. We evaluate the presented prototype on the basis of an application in culturomics and show that it can handle data sets of significant size efficiently

    Prescriptive Analytics:A Survey of Emerging Trends And Technologies

    Get PDF

    ControlFlag: A Self-supervised Idiosyncratic Pattern Detection System for Software Control Structures

    Get PDF
    Software debugging has been shown to utilize upwards of 50% of developers’ time. Machine programming, the field concerned with the automation of software (and hardware) development, has recently made progress in both research and production-quality automated debugging systems. In this paper, we present ControlFlag, a system that detects possible idiosyncratic violations in software control structures. ControlFlag also suggests possible corrections in the event a true error is detected. A novelty of ControlFlag is that it is entirely self-supervised; that is, it requires no labels to learn about the potential idiosyncratic programming pattern violations. In addition to presenting ControlFlag’s design, we also provide an abbreviated experimental evaluation
    • …
    corecore