4 research outputs found

    Stateful Entities: Object-oriented Cloud Applications as Distributed Dataflows

    Full text link
    Programming stateful cloud applications remains a very painful experience. Instead of focusing on the business logic, programmers spend most of their time dealing with distributed systems considerations, with the most important being consistency, load balancing, failure management, recovery, and scalability. At the same time, we witness an unprecedented adoption of modern dataflow systems such as Apache Flink, Google Dataflow, and Timely Dataflow. These systems are now performant and fault-tolerant, and they offer excellent state management primitives. With this line of work, we aim at investigating the opportunities and limits of compiling general-purpose programs into stateful dataflows. Given a set of easy-to-follow code conventions, programmers can author stateful entities, a programming abstraction embedded in Python. We present a compiler pipeline named StateFlow, to analyze the abstract syntax tree of a Python application and rewrite it into an intermediate representation based on stateful dataflow graphs. StateFlow compiles that intermediate representation to a target execution system: Apache Flink and Beam, AWS Lambda, Flink's Statefun, and Cloudburst. Through an experimental evaluation, we demonstrate that the code generated by StateFlow incurs minimal overhead. While developing and deploying our prototype, we came to observe important limitations of current dataflow systems in executing cloud applications at scale

    Leveraging Large Language Models for Sequential Recommendation

    Full text link
    Sequential recommendation problems have received increasing attention in research during the past few years, leading to the inception of a large variety of algorithmic approaches. In this work, we explore how large language models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we devise and evaluate three approaches to leverage the power of LLMs in different ways. Our results from experiments on two datasets show that initializing the state-of-the-art sequential recommendation model BERT4Rec with embeddings obtained from an LLM improves NDCG by 15-20% compared to the vanilla BERT4Rec model. Furthermore, we find that a simple approach that leverages LLM embeddings for producing recommendations, can provide competitive performance by highlighting semantically related items. We publicly share the code and data of our experiments to ensure reproducibility.Comment: 9 page

    Anyone Can Cloud: Democratizing Cloud Application Programming

    No full text
    The cloud is widely adopted as a flexible and on-demand computing infrastructure. In recent years, a new and promising cloud paradigm emerged: serverless computing. Serverless computing promises a pay-as-you-go model and offers features such as autoscaling and high availability. Nevertheless, developing scalable cloud applications remains a painstaking task. Currently, programming models for the cloud mix operational code and business logic causing developers to spend a significant amount of time on other tasks rather than implementing the intended functionality. Moreover, the developer must consider distributed systems concerns such as consistency, communication, and persistence. Modern dataflow systems, such as Apache Flink and Google Dataflow, address these concerns but suffer from the same problem: they lack an intuitive programming interface for general-purpose applications. It remains an open problem to design a developer-friendly programming interface for implementing scalable cloud applications with strong guarantees. In this thesis, we solve this problem by presenting an intuitive programming interface for scalable cloud applications in which developers primarily focus on business logic. Given a set of easy-to-follow code conventions, programmers author stateful entities a programming abstraction embedded in Python. We present a compiler pipeline named StateFlow, to analyze the abstract syntax tree of a Python application and rewrite it into an intermediate representation based on stateful dataflow graphs. In addition, we present a set of building blocks that allow the execution of this intermediate representation on a target runtime system or cloud provider without a tight integration. Supported runtime systems include Apache Flink and Beam, AWS Lambda, Flink's Statefun, and Cloudburst, each providing a different set of guarantees. Finally, we introduce a client-side programming interface and HTTP server integration to interact with the deployed application.We demonstrate that the execution with StateFlow typically incurs less than 1\% overhead. Furthermore, we identify limitations of current dataflow systems in executing cloud applications at scale in a performance benchmark. Finally, we compare the expressiveness of StateFlow's programming abstraction to native runtime implementations. We show that StateFlow lets a developer write universal code that does not mix business with operational logic or the runtime's API and prevents vendor lock-in by allowing them to switch between runtimes in less than ten lines of code.Computer Science | Software Technolog

    A plug-in infrastructure for the CodeFeedr project

    No full text
    CodeFeedr is a research project at the software engineering division of the Delft University of Technology in collaboration with the Software Improvement Group. The research focuses on a software infrastructure which serves software practitioners in utilizing data-driven decision making. Currently, frameworks like Apache Flink are capable of high-performance data streaming. However, these frameworks have a lot of overhead in setting up, and adding new streaming queries takes a lot of time. They also have several limitations in combining real-time data with historical data and doing aggregations on streams from multiple sources. The developed product is a plug-in framework on top of Apache Flink, that provides a pipelining system for streaming queries. This product includes abstractions for well-known sources like GitHub, TravisCI and Twitter as well as support for historical data in mongoDB. With this framework the users can spend their efforts on actually writing streaming queries instead of setting up environments, input sources and output destinations. The product also includes orchestration tools for running streaming jobs on a distributed system.Computer Scienc
    corecore