12 research outputs found

    User-guided Page Merging for Memory Deduplication in Serverless Systems

    Full text link
    Serverless computing is an emerging cloud paradigm that offers an elastic and scalable allocation of computing resources with pay-as-you-go billing. In the Function-as-a-Service (FaaS) programming model, applications comprise short-lived and stateless serverless functions executed in isolated containers or microVMs, which can quickly scale to thousands of instances and process terabytes of data. This flexibility comes at the cost of duplicated runtimes, libraries, and user data spread across many function instances, and cloud providers do not utilize this redundancy. The memory footprint of serverless forces removing idle containers to make space for new ones, which decreases performance through more cold starts and fewer data caching opportunities. We address this issue by proposing deduplicating memory pages of serverless workers with identical content, based on the content-based page-sharing concept of Linux Kernel Same-page Merging (KSM). We replace the background memory scanning process of KSM, as it is too slow to locate sharing candidates in short-lived functions. Instead, we design User-Guided Page Merging (UPM), a built-in Linux kernel module that leverages the madvise system call: we enable users to advise the kernel of memory areas that can be shared with others. We show that UPM reduces memory consumption by up to 55% on 16 concurrent containers executing a typical image recognition function, more than doubling the density for containers of the same function that can run on a system.Comment: Accepted at IEEE BigData 202

    A Survey of Hashing Techniques for High Performance Computing

    Get PDF
    Hashing is a well-known and widely used technique for providing O(1) access to large files on secondary storage and tables in memory. Hashing techniques were introduced in the early 60s. The term hash function historically is used to denote a function that compresses a string of arbitrary input to a string of fixed length. Hashing finds applications in other fields such as fuzzy matching, error checking, authentication, cryptography, and networking. Hashing techniques have found application to provide faster access in routing tables, with the increase in the size of the routing tables. More recently, hashing has found applications in transactional memory in hardware. Motivated by these newly emerged applications of hashing, in this paper we present a survey of hashing techniques starting from traditional hashing methods with greater emphasis on the recent developments. We provide a brief explanation on hardware hashing and a brief introduction to transactional memory

    Efficient Collection and Processing of Cyber Threat Intelligence from Partner Feeds

    Get PDF
    Sharing of threat intelligence between organizations and companies in the cyber security industry is a crucial part of proactive defense against security threats. Even though some standardization efforts exist, most publishers of cyber security feeds use their own approach and provide data in varying formats, schemata, compression algorithms, through differing APIs etc. This makes every feed unique and complicates their automated collection and processing.Furthermore, the published data may contain a lot of irrelevant records, such as duplicates or data about very exotic files or websites, which are not useful. In this work, we present Feed Automation, a cloud-based system for fully automatic collection and processing of cyber threat intelligence from a variety of online feeds. The system provides two means for reduction of noise in the data: a smart deduplication service based on a sliding window technique, which is able to remove just the duplicates with no important changes in the metadata; and efficient rules, easily configurable by the malware analysts, to remove records, which are not useful for us. Additionally, we propose a filtering solution based on machine learning, which is able to predict how useful a record is for our backend systems based on historic data. We demonstrate how this system can help to unify the feed collection, processing, and data noise reduction into one automated system, speeding up development, simplifying maintenance, and reducing the load for the backend systems

    Secure and efficient processing of outsourced data structures using trusted execution environments

    Full text link
    In recent years, more and more companies make use of cloud computing; in other words, they outsource data storage and data processing to a third party, the cloud provider. From cloud computing, the companies expect, for example, cost reductions, fast deployment time, and improved security. However, security also presents a significant challenge as demonstrated by many cloud computing–related data breaches. Whether it is due to failing security measures, government interventions, or internal attackers, data leakages can have severe consequences, e.g., revenue loss, damage to brand reputation, and loss of intellectual property. A valid strategy to mitigate these consequences is data encryption during storage, transport, and processing. Nevertheless, the outsourced data processing should combine the following three properties: strong security, high efficiency, and arbitrary processing capabilities. Many approaches for outsourced data processing based purely on cryptography are available. For instance, encrypted storage of outsourced data, property-preserving encryption, fully homomorphic encryption, searchable encryption, and functional encryption. However, all of these approaches fail in at least one of the three mentioned properties. Besides approaches purely based on cryptography, some approaches use a trusted execution environment (TEE) to process data at a cloud provider. TEEs provide an isolated processing environment for user-defined code and data, i.e., the confidentiality and integrity of code and data processed in this environment are protected against other software and physical accesses. Additionally, TEEs promise efficient data processing. Various research papers use TEEs to protect objects at different levels of granularity. On the one end of the range, TEEs can protect entire (legacy) applications. This approach facilitates the development effort for protected applications as it requires only minor changes. However, the downsides of this approach are that the attack surface is large, it is difficult to capture the exact leakage, and it might not even be possible as the isolated environment of commercially available TEEs is limited. On the other end of the range, TEEs can protect individual, stateless operations, which are called from otherwise unchanged applications. This approach does not suffer from the problems stated before, but it leaks the (encrypted) result of each operation and the detailed control flow through the application. It is difficult to capture the leakage of this approach, because it depends on the processed operation and the operation’s location in the code. In this dissertation, we propose a trade-off between both approaches: the TEE-based processing of data structures. In this approach, otherwise unchanged applications call a TEE for self-contained data structure operations and receive encrypted results. We examine three data structures: TEE-protected B+-trees, TEE-protected database dictionaries, and TEE-protected file systems. Using these data structures, we design three secure and efficient systems: an outsourced system for index searches; an outsourced, dictionary-encoding–based, column-oriented, in-memory database supporting analytic queries on large datasets; and an outsourced system for group file sharing supporting large and dynamic groups. Due to our approach, the systems have a small attack surface, a low likelihood of security-relevant bugs, and a data owner can easily perform a (formal) code verification of the sensitive code. At the same time, we prevent low-level leakage of individual operation results. For all systems, we present a thorough security evaluation showing lower bounds of security. Additionally, we use prototype implementations to present upper bounds on performance. For our implementations, we use a widely available TEE that has a limited isolated environment—Intel Software Guard Extensions. By comparing our systems to related work, we show that they provide a favorable trade-off regarding security and efficiency

    Large Language Models for Software Engineering: A Systematic Literature Review

    Full text link
    Large Language Models (LLMs) have significantly impacted numerous domains, notably including Software Engineering (SE). Nevertheless, a well-rounded understanding of the application, effects, and possible limitations of LLMs within SE is still in its early stages. To bridge this gap, our systematic literature review takes a deep dive into the intersection of LLMs and SE, with a particular focus on understanding how LLMs can be exploited in SE to optimize processes and outcomes. Through a comprehensive review approach, we collect and analyze a total of 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize and provide a comparative analysis of different LLMs that have been employed in SE tasks, laying out their distinctive features and uses. For RQ2, we detail the methods involved in data collection, preprocessing, and application in this realm, shedding light on the critical role of robust, well-curated datasets for successful LLM implementation. RQ3 allows us to examine the specific SE tasks where LLMs have shown remarkable success, illuminating their practical contributions to the field. Finally, RQ4 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE, as well as the common techniques related to prompt optimization. Armed with insights drawn from addressing the aforementioned RQs, we sketch a picture of the current state-of-the-art, pinpointing trends, identifying gaps in existing research, and flagging promising areas for future study
    corecore