1,434 research outputs found

    Blockchain-Coordinated Frameworks for Scalable and Secure Supply Chain Networks

    Full text link
    Supply chains have progressed through time from being limited to a few regional traders to becoming complicated business networks. As a result, supply chain management systems now rely significantly on the digital revolution for the privacy and security of data. Due to key qualities of blockchain, such as transparency, immutability and decentralization, it has recently gained a lot of interest as a way to solve security, privacy and scalability problems in supply chains. However conventional blockchains are not appropriate for supply chain ecosystems because they are computationally costly, have a limited potential to scale and fail to provide trust. Consequently, due to limitations with a lack of trust and coordination, supply chains tend to fail to foster trust among the network’s participants. Assuring data privacy in a supply chain ecosystem is another challenge. If information is being shared with a large number of participants without establishing data privacy, access control risks arise in the network. Protecting data privacy is a concern when sending corporate data, including locations, manufacturing supplies and demand information. The third challenge in supply chain management is scalability, which continues to be a significant barrier to adoption. As the amount of transactions in a supply chain tends to increase along with the number of nodes in a network. So scalability is essential for blockchain adoption in supply chain networks. This thesis seeks to address the challenges of privacy, scalability and trust by providing frameworks for how to effectively combine blockchains with supply chains. This thesis makes four novel contributions. It first develops a blockchain-based framework with Attribute-Based Access Control (ABAC) model to assure data privacy by adopting a distributed framework to enable fine grained, dynamic access control management for supply chain management. To solve the data privacy challenge, AccessChain is developed. This proposed AccessChain model has two types of ledgers in the system: local and global. Local ledgers are used to store business contracts between stakeholders and the ABAC model management, whereas the global ledger is used to record transaction data. AccessChain can enable decentralized, fine-grained and dynamic access control management in SCM when combined with the ABAC model and blockchain technology (BCT). The framework enables a systematic approach that advantages the supply chain, and the experiments yield convincing results. Furthermore, the results of performance monitoring shows that AccessChain’s response time with four local ledgers is acceptable, and therefore it provides significantly greater scalability. Next, a framework for reducing the bullwhip effect (BWE) in SCM is proposed. The framework also focuses on combining data visibility with trust. BWE is first observed in SC and then a blockchain architecture design is used to minimize it. Full sharing of demand data has been shown to help improve the robustness of overall performance in a multiechelon SC environment, especially for BWE mitigation and cumulative cost reduction. It is observed that when it comes to providing access to data, information sharing using a blockchain has some obvious benefits in a supply chain. Furthermore, when data sharing is distributed, parties in the supply chain will have fair access to other parties’ data, even though they are farther downstream. Sharing customer demand is important in a supply chain to enhance decision-making, reduce costs and promote the final end product. This work also explores the ability of BCT as a solution in a distributed ledger approach to create a trust-enhanced environment where trust is established so that stakeholders can share their information effectively. To provide visibility and coordination along with a blockchain consensus process, a new consensus algorithm, namely Reputation-based proof-of cooperation (RPoC), is proposed for blockchain-based SCM, which does not involve validators to solve any mathematical puzzle before storing a new block. The RPoC algorithm is an efficient and scalable consensus algorithm that selects the consensus node dynamically and permits a large number of nodes to participate in the consensus process. The algorithm decreases the workload on individual nodes while increasing consensus performance by allocating the transaction verification process to specific nodes. Through extensive theoretical analyses and experimentation, the suitability of the proposed algorithm is well grounded in terms of scalability and efficiency. The thesis concludes with a blockchain-enabled framework that addresses the issue of preserving privacy and security for an open-bid auction system. This work implements a bid management system in a private BC environment to provide a secure bidding scheme. The novelty of this framework derives from an enhanced approach for integrating BC structures by replacing the original chain structure with a tree structure. Throughout the online world, user privacy is a primary concern, because the electronic environment enables the collection of personal data. Hence a suitable cryptographic protocol for an open-bid auction atop BC is proposed. Here the primary aim is to achieve security and privacy with greater efficiency, which largely depends on the effectiveness of the encryption algorithms used by BC. Essentially this work considers Elliptic Curve Cryptography (ECC) and a dynamic cryptographic accumulator encryption algorithm to enhance security between auctioneer and bidder. The proposed e-bidding scheme and the findings from this study should foster the further growth of BC strategies

    Natural Products as Kinase Inhibitors: Total Synthesis, in Vitro Kinase Activity, in Vivo Toxicology in Zebrafish Embryos and in Silico Docking

    Get PDF
    Despite significant progress in developing small molecule kinase inhibitors, most human kinases still lack high-quality selective inhibitors that might be employed as chemical probes to study their biological function and pharmacology. Natural products (NPs) and their synthetic derivates might give avenues to overcome this frequently encountered challenge as they demonstrated to target a wide range of kinases, including all subfamilies of the known kinome. Nonetheless, isolating these NPs from their sources necessitates massive harvesting, which is fraught with difficulties and triggers enormous harm to the ecology. Moreover, the challenges encountered while extracting these NPs from their sources are constantly present and have few viable solutions. Considering these aspects, total synthesis and semisynthesis have been employed to replicate the most intriguing compounds of living nature in laboratories to obtain larger quantities for extended studies. The present work outlined the attempts to perform the first total syntheses and to evaluate the biological activity of naturally occurring potent anti-cancer compounds: Depsipeptide PM181110, Eudistomidin C, and Fusarithioamide A. Efforts to achieve the first total syntheses of these natural compounds have been based on highly convergent and unified approaches. Depsipeptide PM181110 is a bicyclic depsipeptide featuring four stereogenic centres whose attempts to perform its first total synthesis were undertaken by synthesizing its diastereomers 3R,9R,14R,17R, and 3R,9S,14R,17R. Similarly, for Eudistomidin C and Fusarithioamide A having known stereochemistry, the attempts to perform their syntheses were made starting from enantiomerically pure reagents. The synthesized compounds BSc5484, BSc5517 and the analogues were subjected to biological activity tests afterwards. Accordingly, a kinase inhibitory activity test was performed, followed by an in vivo toxicology assay in wild-type and gold-type zebrafish embryos Danio rerio. As a result, the assayed compounds displayed moderate to good inhibition of the kinases with an apparent selectivity profile and toxicity in zebrafish embryos illustrated by the observed phenotypes. Finally, an in silico experiment revealed that BSc5484 and BSc5485 might bind as type IV inhibitors, while BSc5517 demonstrated a better binding affinity to human Haspin kinase compared to the known b-carboline inhibitor Harmine across the panel of the tested kinases. This work thus provides the first directed tools about the potential of naturally derived compounds as inhibitors of disease-causing proteins that are key players in numerous forms of cancer and other illnesses. Consequently, establishing depsipeptide and b-carboline-based compounds as therapeutic leads is crucial and will provide a powerful tool to further elucidate their biological function through targeted structural variations

    Parallel and Flow-Based High Quality Hypergraph Partitioning

    Get PDF
    Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits. Given a hypergraph and an integer kk, the task is to divide the vertices into kk disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks. In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge. The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases. In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs. Once sufficiently small, an initial partition is computed. Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level. An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time. The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem. Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality. While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible. We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways. Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines. In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof. We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation. For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements. For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly. Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework. It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner. Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level. This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential. We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening. In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio. This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening. The last ingredient for high quality is an iterative improvement algorithm based on maximum flows. In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts. Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel. Beyond the strive for highest quality, we present a deterministically parallel partitioning framework. We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement. Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small. All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets. To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar. While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain. With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense. Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm

    The compatible solutes ectoine and 5-hydroxyectoine: Catabolism and regulatory mechanisms

    Get PDF
    To cope with osmotic stress many microorganisms make use of short, osmotically active, organic compounds, the so-called compatible solutes. Examples for especially effective members of this type of molecules are the tetrahydropyrimidines ectoine and 5-hydroxyectoine. Both molecules are produced by a large number of microorganisms, not only to fend-off osmotic stress, but also for example low and high temperature challenges. The biosynthetic pathway used by these organisms to synthesize ectoines has already been studied intensively and the enzymes used therein are characterized quite well, both biochemically as well as structurally. However, synthesis of ectoines is only half the story. Inevitably, ectoines are frequently released from the producer cells in different environmental settings. Especially in highly competitive habitats like the upper ocean layers some bacteria specialized on a niche like this. The model organism used in this work is such a species. It is the marine bacterium Ruegeria pomeroyi DSS-3 which belongs to the Roseobacter-clade. Roseobacter species are heterotrophic Proteobacteria which can live in symbiosis with phytoplankton as well as turning against them in a bacterial warfare fashion to scavenge valuable nutrients. Ectoines can be imported by R. pomeroyi DSS-3 in a high-affinity fashion and be used as energy as well as carbon- and nitrogen-sources. To achieve this, both ectoines rings are degraded by the hydrolase EutD and deacetylated by the deacetylase EutE. The first hydrolysis products α-ADABA (from ectoine) and hydroxy-α-ADABA (from hydroxyectoine) are deacetylated to DABA and hydroxy-DABA which are in additional biochemical reactions transformed to aspartate to fuel the cell’s central metabolism. The role and functioning of the EutDE enzymes which work in a concerted fashion are a central aspect of this work. Both enzymes could be biochemically and structurally characterized, and the architecture of the metabolic pathway could be illuminated. α-ADABA and hydroxy-α-ADABA are not only central to ectoine catabolism, but also to the regulatory mechanisms associated with it. Both molecules serve as inducers of the central regulatory protein of this pathway, the MocR-/GabR-type regulator protein EnuR. In the framework of this dissertation molecular details could be clarified which enable the EnuR repressor molecule to sense both molecules with high affinity to subsequently derepress the genes for the import and catabolism of ectoines

    Constraint-based simulation of virtual crowds

    Get PDF
    Central to simulating pedestrian crowds is their motion and behaviour. It is required to understand how pedestrians move to simulate and predict scenarios with crowds of people. Pedestrian behaviours enhance the range of motions people can demonstrate, resulting in greater variety, believability, and accuracy. Models with complex computations and motion have difficulty in being extended with additional behaviours. This is because the structure of these models are not designed in a way that is generally compatible with collision avoidance behaviours. To address this issue, this thesis will research a possible pedestrian model that can simulate collision response with a wide range of additional behaviours. The model will do so by using constraints, a limit on the velocity of a person's movement. The proposed model will use constraints as the core computation. By describing behaviours in terms of constraints, these behaviours can be combined with the proposed model. Pedestrian simulations strike a balance between model complexity and runtime speed. Some models focus entirely on the complexity and accuracy of people, while other models focus on creating believable yet lightweight and performant simulations. Believable crowds look realistic to human observation, but do not match up to numerical analysis under scrutiny. The larger the population, and the more complex the motion of people, the slower the simulation will run. One route for improving performance of software is by using Graphical Processing Units (GPUs). GPUs are devices with theoretical performance that far outperforms equivalent multi-core CPUs. Research literature tends to focus on either the accuracy, or the performance optimisations of pedestrian crowd simulations. This suggests that there is opportunity to create more accurate models that run relatively quickly. Real time is a useful measure of model runtime. A simulation that runs in real time can be interactive and respond live to user input. By increasing the performance of the model, larger and more complex models can be simulated. This in turn increases the range of applications the model can represent. This thesis will develop a performant pedestrian simulation that runs in real time. It will explore how suitable the model is for GPU acceleration, and what performance gains can be obtained by implementing the model on the GPU

    Distributed Implementation of eXtended Reality Technologies over 5G Networks

    Get PDF
    Mención Internacional en el título de doctorThe revolution of Extended Reality (XR) has already started and is rapidly expanding as technology advances. Announcements such as Meta’s Metaverse have boosted the general interest in XR technologies, producing novel use cases. With the advent of the fifth generation of cellular networks (5G), XR technologies are expected to improve significantly by offloading heavy computational processes from the XR Head Mounted Display (HMD) to an edge server. XR offloading can rapidly boost XR technologies by considerably reducing the burden on the XR hardware, while improving the overall user experience by enabling smoother graphics and more realistic interactions. Overall, the combination of XR and 5G has the potential to revolutionize the way we interact with technology and experience the world around us. However, XR offloading is a complex task that requires state-of-the-art tools and solutions, as well as an advanced wireless network that can meet the demanding throughput, latency, and reliability requirements of XR. The definition of these requirements strongly depends on the use case and particular XR offloading implementations. Therefore, it is crucial to perform a thorough Key Performance Indicators (KPIs) analysis to ensure a successful design of any XR offloading solution. Additionally, distributed XR implementations can be intrincated systems with multiple processes running on different devices or virtual instances. All these agents must be well-handled and synchronized to achieve XR real-time requirements and ensure the expected user experience, guaranteeing a low processing overhead. XR offloading requires a carefully designed architecture which complies with the required KPIs while efficiently synchronizing and handling multiple heterogeneous devices. Offloading XR has become an essential use case for 5G and beyond 5G technologies. However, testing distributed XR implementations requires access to advanced 5G deployments that are often unavailable to most XR application developers. Conversely, the development of 5G technologies requires constant feedback from potential applications and use cases. Unfortunately, most 5G providers, engineers, or researchers lack access to cutting-edge XR hardware or applications, which can hinder the fast implementation and improvement of 5G’s most advanced features. Both technology fields require ongoing input and continuous development from each other to fully realize their potential. As a result, XR and 5G researchers and developers must have access to the necessary tools and knowledge to ensure the rapid and satisfactory development of both technology fields. In this thesis, we focus on these challenges providing knowledge, tools and solutiond towards the implementation of advanced offloading technologies, opening the door to more immersive, comfortable and accessible XR technologies. Our contributions to the field of XR offloading include a detailed study and description of the necessary network throughput and latency KPIs for XR offloading, an architecture for low latency XR offloading and our full end to end XR offloading implementation ready for a commercial XR HMD. Besides, we also present a set of tools which can facilitate the joint development of 5G networks and XR offloading technologies: our 5G RAN real-time emulator and a multi-scenario XR IP traffic dataset. Firstly, in this thesis, we thoroughly examine and explain the KPIs that are required to achieve the expected Quality of Experience (QoE) and enhanced immersiveness in XR offloading solutions. Our analysis focuses on individual XR algorithms, rather than potential use cases. Additionally, we provide an initial description of feasible 5G deployments that could fulfill some of the proposed KPIs for different offloading scenarios. We also present our low latency muti-modal XR offloading architecture, which has already been tested on a commercial XR device and advanced 5G deployments, such as millimeter-wave (mmW) technologies. Besides, we describe our full endto- end complex XR offloading system which relies on our offloading architecture to provide low latency communication between a commercial XR device and a server running a Machine Learning (ML) algorithm. To the best of our knowledge, this is one of the first successful XR offloading implementations for complex ML algorithms in a commercial device. With the goal of providing XR developers and researchers access to complex 5G deployments and accelerating the development of future XR technologies, we present FikoRE, our 5G RAN real-time emulator. FikoRE has been specifically designed not only to model the network with sufficient accuracy but also to support the emulation of a massive number of users and actual IP throughput. As FikoRE can handle actual IP traffic above 1 Gbps, it can directly be used to test distributed XR solutions. As we describe in the thesis, its emulation capabilities make FikoRE a potential candidate to become a reference testbed for distributed XR developers and researchers. Finally, we used our XR offloading tools to generate an XR IP traffic dataset which can accelerate the development of 5G technologies by providing a straightforward manner for testing novel 5G solutions using realistic XR data. This dataset is generated for two relevant XR offloading scenarios: split rendering, in which the rendering step is moved to an edge server, and heavy ML algorithm offloading. Besides, we derive the corresponding IP traffic models from the captured data, which can be used to generate realistic XR IP traffic. We also present the validation experiments performed on the derived models and their results.This work has received funding from the European Union (EU) Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie ETN TeamUp5G, grant agreement No. 813391.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Narciso García Santos.- Secretario: Fernando Díaz de María.- Vocal: Aryan Kaushi

    Improving software quality through non-functional testing

    Get PDF
    Abstract. As code becomes more complex, new functionalities are added every day, and bugs are fixed, testing gets more time-consuming and composite. In order to maintain high quality of a heavily loaded system, functional testing is not enough anymore. Testing new features involves not only functional, and regression runs, but also resource-intensive non-functional tests. Functional testing verifies a specific piece of functionality, a small business story of the whole product. However, neither full test automation nor even manual testing can guarantee stable work in production. The best approach is a complete testing solution, where non-functional testing follows functional testing inextricably. Objective: the purpose of research is to create a system that covers non-functional testing from the performance side. It is proposed that only a combination of functional and non-functional testing can provide high quality. There is a bunch of pitfalls and nuances in tools and approaches, thus this work strives to understand them and come up with the best solution in each case. Method: The thesis covers a comparative analysis of various tools and frameworks for load testing, developing a testing approach for an open application programming interface, which was chosen as the target software, including automated scripted tests and output results. Results: The most suitable tool was chosen, and based on it load test scripts were implemented, thereby complementing functional testing with non-functional ones, improving the quality of the product on the whole.Ohjelmiston laadun parantaminen ei-toiminnallisen testauksen avulla. Tiivistelmä. Kun koodi muuttuu monimutkaisemmaksi, uusia toimintoja lisätään joka päivä ja vikoja korjataan, jolloin testaamisesta tulee enemmän aikaa vievää ja monimutkaisempaa. Raskaasti kuormitetun järjestelmän korkean laadun ylläpitämiseksi toimintatestaus ei enää riitä. Uusien ominaisuuksien testaamiseen ei sisälly vain toiminnallisia ja regressio testejä, vaan myös resursseja vaativia ei-toiminnallisia testejä. Toiminnallinen testaus varmistaa tietyn vaatimuksen toimivuuden. Täysi testiautomaatio tai edes manuaalinen testaus eivät kuitenkaan takaa vakaata työtä tuotannossa. Paras lähestymistapa on ratkaisu, jossa ei-toiminnallinen testaus seuraa toiminnallista testausta erottamattomasti. Tavoite: Tutkimuksen tarkoituksena on luoda järjestelmä, joka kattaa ei-toiminnallisen testauksen suorituskyvyn puolelta. On ehdotettu, että vain toiminnallisen ja eitoiminnallisen testauksen yhdistelmä voi tuottaa korkeaa laatua. Työkaluissa ja lähestymistavoissa on paljon sudenkuoppia ja vivahteita, joten tässä työssä pyritään ymmärtämään niitä ja löytämään kullekin tapaukselle paras ratkaisu. Menetelmä: Opinnäytetyö kattaa erilaisten kuormitus testauksen työkalujen ja viitekehysten vertailevan analyysin, jossa kehitetään testaustapa testattavaksi ohjelmistoksi valitulle avoimelle sovellusohjelmointi rajapinnalle, mukaan lukien automaattisesti skriptatut testit ja tulosteet. Tulokset: Paras työkalu valittiin ja sen perusteella toteutettiin kuormitustesti skriptit, jotka täydensivät toiminnallista testausta ei-toiminnallisilla parantaen tuotteen laatua kokonaisuutena

    Database System Acceleration on FPGAs

    Get PDF
    Relational database systems provide various services and applications with an efficient means for storing, processing, and retrieving their data. The performance of these systems has a direct impact on the quality of service of the applications that rely on them. Therefore, it is crucial that database systems are able to adapt and grow in tandem with the demands of these applications, ensuring that their performance scales accordingly. In the past, Moore's law and algorithmic advancements have been sufficient to meet these demands. However, with the slowdown of Moore's law, researchers have begun exploring alternative methods, such as application-specific technologies, to satisfy the more challenging performance requirements. One such technology is field-programmable gate arrays (FPGAs), which provide ideal platforms for developing and running custom architectures for accelerating database systems. The goal of this thesis is to develop a domain-specific architecture that can enhance the performance of in-memory database systems when executing analytical queries. Our research is guided by a combination of academic and industrial requirements that seek to strike a balance between generality and performance. The former ensures that our platform can be used to process a diverse range of workloads, while the latter makes it an attractive solution for high-performance use cases. Throughout this thesis, we present the development of a system-on-chip for database system acceleration that meets our requirements. The resulting architecture, called CbMSMK, is capable of processing the projection, sort, aggregation, and equi-join database operators and can also run some complex TPC-H queries. CbMSMK employs a shared sort-merge pipeline for executing all these operators, which results in an efficient use of FPGA resources. This approach enables the instantiation of multiple acceleration cores on the FPGA, allowing it to serve multiple clients simultaneously. CbMSMK can process both arbitrarily deep and wide tables efficiently. The former is achieved through the use of the sort-merge algorithm which utilizes the FPGA RAM for buffering intermediate sort results. The latter is achieved through the use of KeRRaS, a novel variant of the forward radix sort algorithm introduced in this thesis. KeRRaS allows CbMSMK to process a table a few columns at a time, incrementally generating the final result through multiple iterations. Given that acceleration is a key objective of our work, CbMSMK benefits from many performance optimizations. For instance, multi-way merging is employed to reduce the number of merge passes required for the execution of the sort-merge algorithm, thus improving the performance of all our pipeline-breaking operators. Another example is our in-depth analysis of early aggregation, which led to the development of a novel cache-based algorithm that significantly enhances aggregation performance. Our experiments demonstrate that CbMSMK performs on average 5 times faster than the state-of-the-art CPU-based database management system MonetDB.:I Database Systems & FPGAs 1 INTRODUCTION 1.1 Databases & the Importance of Performance 1.2 Accelerators & FPGAs 1.3 Requirements 1.4 Outline & Summary of Contributions 2 BACKGROUND ON DATABASE SYSTEMS 2.1 Databases 2.1.1 Storage Model 2.1.2 Storage Medium 2.2 Database Operators 2.2.1 Projection 2.2.2 Filter 2.2.3 Sort 2.2.4 Aggregation 2.2.5 Join 2.2.6 Operator Classification 2.3 Database Queries 2.4 Impact of Acceleration 3 BACKGROUND ON FPGAS 3.1 FPGA 3.1.1 Logic Element 3.1.2 Block RAM (BRAM) 3.1.3 Digital Signal Processor (DSP) 3.1.4 IO Element 3.1.5 Programmable Interconnect 3.2 FPGADesignFlow 3.2.1 Specifications 3.2.2 RTL Description 3.2.3 Verification 3.2.4 Synthesis, Mapping, Placement, and Routing 3.2.5 TimingAnalysis 3.2.6 Bitstream Generation and FPGA Programming 3.3 Implementation Quality Metrics 3.4 FPGA Cards 3.5 Benefits of Using FPGAs 3.6 Challenges of Using FPGAs 4 RELATED WORK 4.1 Summary of Related Work 4.2 Platform Type 4.2.1 Accelerator Card 4.2.2 Coprocessor 4.2.3 Smart Storage 4.2.4 Network Processor 4.3 Implementation 4.3.1 Loop-based implementation 4.3.2 Sort-based Implementation 4.3.3 Hash-based Implementation 4.3.4 Mixed Implementation 4.4 A Note on Quantitative Performance Comparisons II Cache-Based Morphing Sort-Merge with KeRRaS (CbMSMK) 5 OBJECTIVES AND ARCHITECTURE OVERVIEW 5.1 From Requirements to Objectives 5.2 Architecture Overview 5.3 Outlineof Part II 6 COMPARATIVE ANALYSIS OF OPENCL AND RTL FOR SORT-MERGE PRIMITIVES ON FPGAS 6.1 Programming FPGAs 6.2 RelatedWork 6.3 Architecture 6.3.1 Global Architecture 6.3.2 Sorter Architecture 6.3.3 Merger Architecture 6.3.4 Scalability and Resource Adaptability 6.4 Experiments 6.4.1 OpenCL Sort-Merge Implementation 6.4.2 RTLSorters 6.4.3 RTLMergers 6.4.4 Hybrid OpenCL-RTL Sort-Merge Implementation 6.5 Summary & Discussion 7 RESOURCE-EFFICIENT ACCELERATION OF PIPELINE-BREAKING DATABASE OPERATORS ON FPGAS 7.1 The Case for Resource Efficiency 7.2 Related Work 7.3 Architecture 7.3.1 Sorters 7.3.2 Sort-Network 7.3.3 X:Y Mergers 7.3.4 Merge-Network 7.3.5 Join Materialiser (JoinMat) 7.4 Experiments 7.4.1 Experimental Setup 7.4.2 Implementation Description & Tuning 7.4.3 Sort Benchmarks 7.4.4 Aggregation Benchmarks 7.4.5 Join Benchmarks 7. Summary 8 KERRAS: COLUMN-ORIENTED WIDE TABLE PROCESSING ON FPGAS 8.1 The Scope of Database System Accelerators 8.2 Related Work 8.3 Key-Reduce Radix Sort(KeRRaS) 8.3.1 Time Complexity 8.3.2 Space Complexity (Memory Utilization) 8.3.3 Discussion and Optimizations 8.4 Architecture 8.4.1 MSM 8.4.2 MSMK: Extending MSM with KeRRaS 8.4.3 Payload, Aggregation and Join Processing 8.4.4 Limitations 8.5 Experiments 8.5.1 Experimental Setup 8.5.2 Datasets 8.5.3 MSMK vs. MSM 8.5.4 Payload-Less Benchmarks 8.5.5 Payload-Based Benchmarks 8.5.6 Flexibility 8.6 Summary 9 A STUDY OF EARLY AGGREGATION IN DATABASE QUERY PROCESSING ON FPGAS 9.1 Early Aggregation 9.2 Background & Related Work 9.2.1 Sort-Based Early Aggregation 9.2.2 Cache-Based Early Aggregation 9.3 Simulations 9.3.1 Datasets 9.3.2 Metrics 9.3.3 Sort-Based Versus Cache-Based Early Aggregation 9.3.4 Comparison of Set-Associative Caches 9.3.5 Comparison of Cache Structures 9.3.6 Comparison of Replacement Policies 9.3.7 Cache Selection Methodology 9.4 Cache System Architecture 9.4.1 Window Aggregator 9.4.2 Compressor & Hasher 9.4.3 Collision Detector 9.4.4 Collision Resolver 9.4.5 Cache 9.5 Experiments 9.5.1 Experimental Setup 9.5.2 Resource Utilization and Parameter Tuning 9.5.3 Datasets 9.5.4 Benchmarks on Synthetic Data 9.5.5 Benchmarks on Real Data 9.6 Summary 10 THE FULL PICTURE 10.1 System Architecture 10.2 Benchmarks 10.3 Meeting the Objectives III Conclusion 11 SUMMARY AND OUTLOOK ON FUTURE RESEARCH 11.1 Summary 11.2 Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLE

    Optimization of Spin Coherence Time at a Prototype Storage Ring for Electric Dipole Moment Investigations

    Get PDF
    The standard model of particle physics has thus far fallen short of being able to explain the observed amount of matter-antimatter asymmetry in the Universe. Electric Dipole Moments (EDMs) of fundamental particles are very sensitive probes of physics beyond the Standard Model. The JEDI collaboration is dedicated to the measurement of the electric dipole moments of charged particles by using a polarized beam in storage rings. The goal can be accomplished by performing the measurement in a pure electrostatic storage ring, which can freeze the horizontal spin precession of protons. As an intermediate step, a smaller "prototype" storage ring, capable of using a combination of electric and magnetic fields, is proposed to serve as a proof-of-principle and to better understand the required systematics. A fundamental parameter to be optimised to reach the highest possible sensitivity in the EDM measurement is the Spin Coherence Time (SCT) of the stored polarized beam, that is the time interval within which the particles of the stored beam maintain a net polarisation greater than 1/e of its initial value. To identify the working conditions that maximise SCT, accurate spin dynamics simulations have been performed on the lattice of the prototype ring. This study presents an investigation of the variation of the beam and spin parameters that influence SCT as well as an optimisation strategy for the sextupole settings to obtain the highest spin coherence time at any given working condition of the ring. The study provides a data set of many configurations with spin coherence times high enough to meet the sensitivity requirements of the EDM measurement. It also analyses the possible design factors that may negatively impact SCTs and discusses possible reconfigurations or design upgrades to improve these values in the future.Il modello standard della fisica delle particelle finora non è stato in grado di spiegare la asimmetria materia-antimateria osservata nell'Universo. I Momenti di Dipolo Elettrico (EDM) delle particelle fondamentali sono sonde molto sensibili della fisica oltre il modello standard. La collaborazione JEDI sta conducendo una di queste ricerche, con l'obiettivo di misurare direttamente i Momenti di Dipolo Elettrico di particelle cariche utilizzando un fascio polarizzato in anelli di accumulazione. L'obiettivo potrà essere raggiunto eseguendo la misura in un anello di accumulazione puramente elettrostatico, che può "congelare" la precessione di spin dei protoni sul piano orizzontale. Come fase intermedia, si propone un "prototipo" di anello di accumulazione più piccolo, in grado di utilizzare una combinazione di campi elettrici e magnetici, che serva a dimostrare il principio di base e per comprendere meglio le sistematiche richieste. Un parametro fondamentale da ottimizzare per raggiungere la massima sensibilità possibile nella misura EDM è il Tempo di Coerenza dello Spin (SCT) del fascio polarizzato accumulato nell'anello, cioè l'intervallo di tempo entro il quale le particelle del fascio mantengono una polarizzazione netta maggiore di 1/e del valore iniziale. Per identificare le condizioni di lavoro che massimizzano l'SCT, sono state eseguite accurate simulazioni delle dinamiche dello spin nel reticolo del prototipo dell'anello. Questo studio presenta un'indagine sulla variazione dei parametri del fascio e dello spin che influenzano l'SCT, nonché una strategia di ottimizzazione per le impostazioni dei sestupoli per ottenere il Tempo di Coerenza dello Spin più elevato in qualsiasi condizione di lavoro dell'anello. Lo studio fornisce un set di dati di molte configurazioni con differenti Tempi di Coerenza dello Spin sufficientemente elevati da soddisfare i requisiti di sensibilità della misurazione EDM. Analizza inoltre i possibili fattori di progettazione che possono avere un impatto negativo sugli SCT e discute possibili riconfigurazioni o aggiornamenti per migliorare questi valori in futuro
    • …
    corecore