447 research outputs found
Putting Pandas in a Box
Pandas - the Python Data Analysis Library - is a powerful and widely used framework for data analytics. In this work we present our approach to push down the computational part of Pandas scripts into the DBMS by using a transpiler. In addition to basic data processing operations, our approach also supports access to external data stored in files instead of the DBMS. Moreover, user-defined Python functions are transformed automatically to SQL UDFs executed in the DBMS. The latter allows the integration of complex computational tasks including machine learning. We show the usage of this feature to implement a so-called model join, i.e. applying pre-trained ML models to data in SQL tables
The Collection Virtual Machine: An Abstraction for Multi-Frontend Multi-Backend Data Analysis
Getting the best performance from the ever-increasing number of hardware
platforms has been a recurring challenge for data processing systems. In recent
years, the advent of data science with its increasingly numerous and complex
types of analytics has made this challenge even more difficult. In practice,
system designers are overwhelmed by the number of combinations and typically
implement only one analysis/platform combination, leading to repeated
implementation effort -- and a plethora of semi-compatible tools for data
scientists.
In this paper, we propose the "Collection Virtual Machine" (or CVM) -- an
extensible compiler framework designed to keep the specialization process of
data analytics systems tractable. It can capture at the same time the essence
of a large span of low-level, hardware-specific implementation techniques as
well as high-level operations of different types of analyses. At its core lies
a language for defining nested, collection-oriented intermediate
representations (IRs). Frontends produce programs in their IR flavors defined
in that language, which get optimized through a series of rewritings (possibly
changing the IR flavor multiple times) until the program is finally expressed
in an IR of platform-specific operators. While reducing the overall
implementation effort, this also improves the interoperability of both analyses
and hardware platforms. We have used CVM successfully to build specialized
backends for platforms as diverse as multi-core CPUs, RDMA clusters, and
serverless computing infrastructure in the cloud and expect similar results for
many more frontends and hardware platforms in the near future.Comment: This paper is currently under review at DaMoN'2
WHAT IS SMART ABOUT SERVICES? BREAKING THE BOND BETWEEN THE SMART PRODUCT AND THE SERVICE
While the conceptual delineation between conventional and smart products is rather conspicuous, the distinction between conventional services and their smart counterparts remains elusive. This study develops a conceptual framework for understanding the distinctive attributes of smart services and their relationship to smart products. In a systematic literature review of publications from top information systems outlets, 30 contributions holding relevant information on smart services are identified and subjected to content analysis. The analysis reveals a variety of different definitions and characterizations of smart services and relations to concepts like data-driven services and services associated to smart products and smart objects. These findings are used to examine artifacts developed in rather design-oriented papers to derive five dimensions that impact the level of smartness of services: richness of the data, the knowledge intensiveness of the engine for decision support, the level of sophistication of the outcome delivered to the service user(s), the architecture of the stakeholders, and the automation level of the service processes. Within this scope, the product can have four roles: sensor, computer, interface, or integrator. The paper concludes by identifying some gaps in the overall research landscape and provides directions for future research
An Integrated View on the Future of Logistics and Information Technology
In this position paper, we present our vision on the future of the logistics
business domain and the use of information technology (IT) in this domain. The
vision is based on extensive experience with Dutch and European logistics in
various contexts and from various perspectives. We expect that the vision also
holds for logistics outside Europe. We build our vision in a number of steps.
First, we make an inventory of the most important trends in the logistics
domain - we call these mega-trends. Next, we do the same for the information
technology domain, restricted to technologies that have relevance for
logistics. Then, we introduce logistics meta-concepts that we use to describe
our vision and relate them to business engineering. We use these three
ingredients to analyze leading concepts that we currently observe in the
logistics domain. Next, we consolidate all elements into a model that
represents our vision of the integrated future of logistics and IT. We
elaborate on the role of data platforms and open standards in this integrated
vision.Comment: 22 pages, 7 figures, 3 table
Recommended from our members
Integrating Conversational Agents and Knowledge Graphs Within the Scholarly Domain
In the last few years, chatbots have become mainstream solutions adopted in a variety of domains for automatizing communication at scale. In the same period, knowledge graphs have attracted significant attention from business and academia as robust and scalable representations of information. In the scientific and academic research domain, they are increasingly used to illustrate the relevant actors (e.g., researchers, institutions), documents (e.g., articles, patents), entities (e.g., concepts, innovations), and other related information. Following the same direction, this paper describes how to integrate conversational agents with knowledge graphs focused on the scholarly domain, a.k.a. Scientific Knowledge Graphs. On top of the proposed architecture, we developed AIDA-Bot, a simple chatbot that leverages a large-scale knowledge graph of scholarly data. AIDA-Bot can answer natural language questions about scientific articles, research concepts, researchers, institutions, and research venues. We have developed four prototypes of AIDA-Bot on Alexa products, web browsers, Telegram clients, and humanoid robots. We performed a user study evaluation with 15 domain experts showing a high level of interest and engagement with the proposed agent
- …