447 research outputs found

    Putting Pandas in a Box

    Get PDF
    Pandas - the Python Data Analysis Library - is a powerful and widely used framework for data analytics. In this work we present our approach to push down the computational part of Pandas scripts into the DBMS by using a transpiler. In addition to basic data processing operations, our approach also supports access to external data stored in files instead of the DBMS. Moreover, user-defined Python functions are transformed automatically to SQL UDFs executed in the DBMS. The latter allows the integration of complex computational tasks including machine learning. We show the usage of this feature to implement a so-called model join, i.e. applying pre-trained ML models to data in SQL tables

    The Collection Virtual Machine: An Abstraction for Multi-Frontend Multi-Backend Data Analysis

    Full text link
    Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement only one analysis/platform combination, leading to repeated implementation effort -- and a plethora of semi-compatible tools for data scientists. In this paper, we propose the "Collection Virtual Machine" (or CVM) -- an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future.Comment: This paper is currently under review at DaMoN'2

    WHAT IS SMART ABOUT SERVICES? BREAKING THE BOND BETWEEN THE SMART PRODUCT AND THE SERVICE

    Get PDF
    While the conceptual delineation between conventional and smart products is rather conspicuous, the distinction between conventional services and their smart counterparts remains elusive. This study develops a conceptual framework for understanding the distinctive attributes of smart services and their relationship to smart products. In a systematic literature review of publications from top information systems outlets, 30 contributions holding relevant information on smart services are identified and subjected to content analysis. The analysis reveals a variety of different definitions and characterizations of smart services and relations to concepts like data-driven services and services associated to smart products and smart objects. These findings are used to examine artifacts developed in rather design-oriented papers to derive five dimensions that impact the level of smartness of services: richness of the data, the knowledge intensiveness of the engine for decision support, the level of sophistication of the outcome delivered to the service user(s), the architecture of the stakeholders, and the automation level of the service processes. Within this scope, the product can have four roles: sensor, computer, interface, or integrator. The paper concludes by identifying some gaps in the overall research landscape and provides directions for future research

    An Integrated View on the Future of Logistics and Information Technology

    Get PDF
    In this position paper, we present our vision on the future of the logistics business domain and the use of information technology (IT) in this domain. The vision is based on extensive experience with Dutch and European logistics in various contexts and from various perspectives. We expect that the vision also holds for logistics outside Europe. We build our vision in a number of steps. First, we make an inventory of the most important trends in the logistics domain - we call these mega-trends. Next, we do the same for the information technology domain, restricted to technologies that have relevance for logistics. Then, we introduce logistics meta-concepts that we use to describe our vision and relate them to business engineering. We use these three ingredients to analyze leading concepts that we currently observe in the logistics domain. Next, we consolidate all elements into a model that represents our vision of the integrated future of logistics and IT. We elaborate on the role of data platforms and open standards in this integrated vision.Comment: 22 pages, 7 figures, 3 table
    • …
    corecore