373 research outputs found

    Transactional and analytical data management on persistent memory

    Get PDF
    Die zunehmende Anzahl von Smart-Geräten und Sensoren, aber auch die sozialen Medien lassen das Datenvolumen und damit die geforderte Verarbeitungsgeschwindigkeit stetig wachsen. Gleichzeitig müssen viele Anwendungen Daten persistent speichern oder sogar strenge Transaktionsgarantien einhalten. Die neuartige Speichertechnologie Persistent Memory (PMem) mit ihren einzigartigen Eigenschaften scheint ein natürlicher Anwärter zu sein, um diesen Anforderungen effizient nachzukommen. Sie ist im Vergleich zu DRAM skalierbarer, günstiger und dauerhaft. Im Gegensatz zu Disks ist sie deutlich schneller und direkt adressierbar. Daher wird in dieser Dissertation der gezielte Einsatz von PMem untersucht, um den Anforderungen moderner Anwendung gerecht zu werden. Nach der Darlegung der grundlegenden Arbeitsweise von und mit PMem, konzentrieren wir uns primär auf drei Aspekte der Datenverwaltung. Zunächst zerlegen wir mehrere persistente Daten- und Indexstrukturen in ihre zugrundeliegenden Entwurfsprimitive, um Abwägungen für verschiedene Zugriffsmuster aufzuzeigen. So können wir ihre besten Anwendungsfälle und Schwachstellen, aber auch allgemeine Erkenntnisse über das Entwerfen von PMem-basierten Datenstrukturen ermitteln. Zweitens schlagen wir zwei Speicherlayouts vor, die auf analytische Arbeitslasten abzielen und eine effiziente Abfrageausführung auf beliebigen Attributen ermöglichen. Während der erste Ansatz eine verknüpfte Liste von mehrdimensionalen gruppierten Blöcken verwendet, handelt es sich beim zweiten Ansatz um einen mehrdimensionalen Index, der Knoten im DRAM zwischenspeichert. Drittens zeigen wir unter Verwendung der bisherigen Datenstrukturen und Erkenntnisse, wie Datenstrom- und Ereignisverarbeitungssysteme mit transaktionaler Zustandsverwaltung verbessert werden können. Dabei schlagen wir ein neuartiges Transactional Stream Processing (TSP) Modell mit geeigneten Konsistenz- und Nebenläufigkeitsprotokollen vor, die an PMem angepasst sind. Zusammen sollen die diskutierten Aspekte eine Grundlage für die Entwicklung noch ausgereifterer PMem-fähiger Systeme bilden. Gleichzeitig zeigen sie, wie Datenverwaltungsaufgaben PMem ausnutzen können, indem sie neue Anwendungsgebiete erschließen, die Leistung, Skalierbarkeit und Wiederherstellungsgarantien verbessern, die Codekomplexität vereinfachen sowie die ökonomischen und ökologischen Kosten reduzieren.The increasing number of smart devices and sensors, but also social media are causing the volume of data and thus the demanded processing speed to grow steadily. At the same time, many applications need to store data persistently or even comply with strict transactional guarantees. The novel storage technology Persistent Memory (PMem), with its unique properties, seems to be a natural candidate to meet these requirements efficiently. Compared to DRAM, it is more scalable, less expensive, and durable. In contrast to disks, it is significantly faster and directly addressable. Therefore, this dissertation investigates the deliberate employment of PMem to fit the needs of modern applications. After presenting the fundamental work of and with PMem, we focus primarily on three aspects of data management. First, we disassemble several persistent data and index structures into their underlying design primitives to reveal the trade-offs for various access patterns. It allows us to identify their best use cases and vulnerabilities but also to gain general insights into the design of PMem-based data structures. Second, we propose two storage layouts that target analytical workloads and enable an efficient query execution on arbitrary attributes. While the first approach employs a linked list of multi-dimensional clustered blocks that potentially span several storage layers, the second approach is a multi-dimensional index that caches nodes in DRAM. Third, we show how to improve stream and event processing systems involving transactional state management using the preceding data structures and insights. In this context, we propose a novel Transactional Stream Processing (TSP) model with appropriate consistency and concurrency protocols adapted to PMem. Together, the discussed aspects are intended to provide a foundation for developing even more sophisticated PMemenabled systems. At the same time, they show how data management tasks can take advantage of PMem by opening up new application domains, improving performance, scalability, and recovery guarantees, simplifying code complexity, plus reducing economic and environmental costs

    Developing Collaborative XML Editing Systems

    Get PDF
    In many areas the eXtensible Mark-up Language (XML) is becoming the standard exchange and data format. More and more applications not only support XML as an exchange format but also use it as their data model or default file format for graphic, text and database (such as spreadsheet) applications. Computer Supported Cooperative Work is an interdisciplinary field of research dealing with group work, cooperation and their supporting information and communication technologies. One part of it is Real-Time Collaborative Editing, which investigates the design of systems which allow several persons to work simultaneously in real-time on the same document, without the risk of inconsistencies. Existing collaborative editing research applications specialize in one or at best, only a small number of document types; for example graphic, text or spreadsheet documents. This research investigates the development of a software framework which allows collaborative editing of any XML document type in real-time. This presents a more versatile solution to the problems of real-time collaborative editing. This research contributes a new software framework model which will assist software engineers in the development of new collaborative XML editing applications. The devised framework is flexible in the sense that it is easily adaptable to different workflow requirements covering concurrency control, awareness mechanisms and optional locking of document parts. Additionally this thesis contributes a new framework integration strategy that enables enhancements of existing single-user editing applications with real-time collaborative editing features without changing their source code

    Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

    Get PDF
    The efficient parallel execution of scientific applications is a key challenge in high-performance computing (HPC). With growing parallelism and heterogeneity of compute resources as well as increasingly complex software, performance analysis has become an indispensable tool in the development and optimization of parallel programs. This thesis presents a framework for systematic performance analysis of scalable, heterogeneous applications. Based on event traces, it automatically detects the critical path and inefficiencies that result in waiting or idle time, e.g. due to load imbalances between parallel execution streams. As a prerequisite for the analysis of heterogeneous programs, this thesis specifies inefficiency patterns for computation offloading. Furthermore, an essential contribution was made to the development of tool interfaces for OpenACC and OpenMP, which enable a portable data acquisition and a subsequent analysis for programs with offload directives. At present, these interfaces are already part of the latest OpenACC and OpenMP API specification. The aforementioned work, existing preliminary work, and established analysis methods are combined into a generic analysis process, which can be applied across programming models. Based on the detection of wait or idle states, which can propagate over several levels of parallelism, the analysis identifies wasted computing resources and their root cause as well as the critical-path share for each program region. Thus, it determines the influence of program regions on the load balancing between execution streams and the program runtime. The analysis results include a summary of the detected inefficiency patterns and a program trace, enhanced with information about wait states, their cause, and the critical path. In addition, a ranking, based on the amount of waiting time a program region caused on the critical path, highlights program regions that are relevant for program optimization. The scalability of the proposed performance analysis and its implementation is demonstrated using High-Performance Linpack (HPL), while the analysis results are validated with synthetic programs. A scientific application that uses MPI, OpenMP, and CUDA simultaneously is investigated in order to show the applicability of the analysis

    A digital vault solution for banking institutions

    Get PDF
    Trabalho de projecto de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2019Este projeto surgiu no âmbito da necessidade que a empresa Securibox tem em fornecer um produto de armazenamento seguro compatível com o funcionamento na nuvem, para as instituições bancárias que operam no mercado francês. Com o aparecimento da banca on-line e o intuito de atrair mais clientes, as instituições bancárias começaram a oferecer serviços que vão para além dos serviços convencionais deste setor. Muitas vezes esses serviços tratam ou armazenam dados sensíveis dos seus clientes e podem até incluir informação e documentos pessoais dos utilizadores que estão hospedados noutras entidades, tais como faturas eletrónicas, transações bancárias de outras instituições financeiras e recibos de vencimento. No entanto, sempre que for necessário armazenar informação dos clientes, este processo tem de respeitar um conjunto de boas práticas e normas do país onde a instituição opera, utilizando para o efeito um cofre digital. No caso do mercado francês, existem poucas soluções que satisfazem, parcialmente ou totalmente, as normas e a legislação respeitante aos cofres digitais e que sejam tecnicamente eficientes e competitivas. O objetivo deste trabalho visou desenvolver uma versão inicial de uma solução que colmata a necessidade atual do mercado bancário francês relativo à área de armazenamento e manuseamento inteligente de dados. Para satisfazer as normas da União Europeia e da França em particular, é necessário armazenar os ficheiros de forma cifrada, registar o seu formato, como, quando e por quem estes formas acedidos e os seus meta-dados de modo a garantir a sua preservação mesmo após a eliminação dos mesmos. Este desafio foi resolvido, e para se destacar das soluções atualmente existentes, foi construída a base para no futuro integrar esta solução com o serviço Securibox ParseXtract, que tem a capacidade de analisar e extrair informação importante do conteúdo dos documentos, de uma forma estruturada e precisa, recorrendo a aprendizagem automática. Para o armazenamento dos documentos a solução adotada foi o OpenStack Swift – um software de código aberto, compatível com nuvens pública e privadas. Uma vez que os documentos podem ser eliminados do sistema pelo utilizador, é necessário a existência de uma plataforma, separada do OpenStack, para armazenar os dados relativos aos meta-dados dos documentos e acessos ao sistema. A solução encontrada para o armazenamento destes dados, consiste no seu registo, através de logs, numa base de dados não relacional – o MongoDB, que é compatível com tecnologias em nuvem e é eficiente com grandes volumes de dados. Para realizar a comunicação entre os vários componentes do cofre digital, foi criado um serviço que oferece uma REST API, o núcleo da solução. Nesta camada, os documentos são cifrados garantindo também a integridade, confidencialidade e o não-repúdio dos dados. Por último, um servidor Web que comunica com a REST API foi criado para demonstrar todas as funcionalidades do cofre digital. As principais vantagens desta solução consistem na utilização de tecnologias código aberto, na compatibilidade com o funcionamento na nuvem, na escalabilidade de todas as suas camadas, tais como o armazenamento de dados, logs e serviço web API, e numa melhor integração com outros produtos da Securibox, que deste modo reduzem o custo da solução para o cliente final. Do ponto de vista conceptual, esta solução pode ser utilizada não apenas pelo sector bancário, mas também por qualquer outra área empresarial onde é necessário armazenar grandes volumes de dados em nuvem privada e pública em simultâneo, tendo como base uma solução facilmente escalável e onde todas as ações dos seus utilizadores são rastreáveis em conformidade com a legislação.This project is a result of the Securibox need to provide a digital vault storage solution for some of their bank clients, operating in the French market. Since electronic banking has emerged, banking institutions began to provide online services that go beyond conventional bank services to attract more users. Sometimes those services involve operations with personal data of their customers which can include data and documents from other services, entities and companies. All this information must be stored on the banking institution side, using a secure digital vault storage, while respecting the legislation of the country where the institution is located. The goal of this work was to develop an initial solution, that would address the current needs of the French banking market, regarding intelligent data handling and storage. To be compliant with the European Union and the French legislation it is necessary to ensure the security and the privacy of the costumers documents and data. To address those requirements a REST API solution was developed using .Net technology. This solution is divided in 3 layers. The document storage layer, the metadata and log storage layer and the core layer. The documents are encrypted and stored at the OpenStack Swift environment, while metadata is stored at the MongoDB database as journal log entries. The information processing and the communication between OpenStack and MongoDB occurs at the core layer. This solution relies on open-source technologies, is easily scalable and compatible with other Securibox products. Conceptually it can be used, not only by banking institutions, but also by any organization or company that have to store and deal with large amounts of information

    A Web Component for Real-Time Collaborative Text Editing

    Get PDF
    Real-time collaborative software allows physically distinct people to co-operate by working on a shared application state, receiving updates from each other in real-time. The goal of this thesis was to create a developer tool, which would allow web application developers to easily integrate a collaborative text editor into their applications. In order to remain technology agnostic and to utilize the latest web standards, this product was implemented as a web component, a reusable user interface component built with native web browser features. The main challenge in developing a real-time collaboration tool is the handling of concurrent updates, which might conflict with one another. To tackle this issue, many consistency maintenance algorithms have been presented in the academic literature. Most of these techniques are variations of two main approaches: operational transformation and commutative replicated data types. In this thesis, we reviewed some of these methods and chose the GOTO operational transformation algorithm to be implemented in our component. Besides selecting and implementing an appropriate consistency maintenance technique, the contributions of this thesis include the design of an easy-to-use application programming interface (API). Our solution also fulfills some practical requirements of group editors not covered by the consistency maintenance theory, such as session management and cleaning of the message queue. The created web component succeeds in encapsulating the complexity related to concurrency control and handling of joining peers in the client-side implementation, which allows the application logic to remain simplistic. This open-source product enables software developers to add a collaborative text editor to their web applications by broadcasting the updates provided by an event-based API to participating peers

    A Pattern Language for Designing Application-Level Communication Protocols and the Improvement of Computer Science Education through Cloud Computing

    Get PDF
    Networking protocols have been developed throughout time following layered architectures such as the Open Systems Interconnection model and the Internet model. These protocols are grouped in the Internet protocol suite. Most developers do not deal with low-level protocols, instead they design application-level protocols on top of the low-level protocol. Although each application-level protocol is different, there is commonality among them and developers can apply lessons learned from one protocol to the design of new ones. Design patterns can help by gathering and sharing proven and reusable solution to common, reoccurring design problems. The Application-level Communication Protocols Design Patterns language captures this knowledge about application-level protocol design, so developers can create better, more fitting protocols base on these common and well proven solutions. Another aspect of contemporary development technics is the need of distribution of software artifacts. Most of the development companies have started using Cloud Computing services to overcome this need; either public or private clouds are widely used. Future developers need to manage this technology infrastructure, software, and platform as services. These two aspects, communication protocols design and cloud computing represent an opportunity to contribute to the software development community and to the software engineering education curriculum. The Application-level Communication Protocols Design Patterns language aims to help solve communication software design. The use of cloud computing in programming assignments targets on a positive influence on improving the Analysis to Reuse skills of students of computer science careers

    Identifying and Harnessing Concurrency for Parallel and Distributed Network Simulation

    Get PDF
    Although computer networks are inherently parallel systems, the parallel execution of network simulations on interconnected processors frequently yields only limited benefits. In this thesis, methods are proposed to estimate and understand the parallelization potential of network simulations. Further, mechanisms and architectures for exploiting the massively parallel processing resources of modern graphics cards to accelerate network simulations are proposed and evaluated

    Efficient Data Streaming Analytic Designs for Parallel and Distributed Processing

    Get PDF
    Today, ubiquitously sensing technologies enable inter-connection of physical\ua0objects, as part of Internet of Things (IoT), and provide massive amounts of\ua0data streams. In such scenarios, the demand for timely analysis has resulted in\ua0a shift of data processing paradigms towards continuous, parallel, and multitier\ua0computing. However, these paradigms are followed by several challenges\ua0especially regarding analysis speed, precision, costs, and deterministic execution.\ua0This thesis studies a number of such challenges to enable efficient continuous\ua0processing of streams of data in a decentralized and timely manner.In the first part of the thesis, we investigate techniques aiming at speeding\ua0up the processing without a loss in precision. The focus is on continuous\ua0machine learning/data mining types of problems, appearing commonly in IoT\ua0applications, and in particular continuous clustering and monitoring, for which\ua0we present novel algorithms; (i) Lisco, a sequential algorithm to cluster data\ua0points collected by LiDAR (a distance sensor that creates a 3D mapping of the\ua0environment), (ii) p-Lisco, the parallel version of Lisco to enhance pipeline- and\ua0data-parallelism of the latter, (iii) pi-Lisco, the parallel and incremental version\ua0to reuse the information and prevent redundant computations, (iv) g-Lisco, a\ua0generalized version of Lisco to cluster any data with spatio-temporal locality\ua0by leveraging the implicit ordering of the data, and (v) Amble, a continuous\ua0monitoring solution in an industrial process.In the second part, we investigate techniques to reduce the analysis costs\ua0in addition to speeding up the processing while also supporting deterministic\ua0execution. The focus is on problems associated with availability and utilization\ua0of computing resources, namely reducing the volumes of data, involving\ua0concurrent computing elements, and adjusting the level of concurrency. For\ua0that, we propose three frameworks; (i) DRIVEN, a framework to continuously\ua0compress the data and enable efficient transmission of the compact data in the\ua0processing pipeline, (ii) STRATUM, a framework to continuously pre-process\ua0the data before transferring the later to upper tiers for further processing, and\ua0(iii) STRETCH, a framework to enable instantaneous elastic reconfigurations\ua0to adjust intra-node resources at runtime while ensuring determinism.The algorithms and frameworks presented in this thesis contribute to an\ua0efficient processing of data streams in an online manner while utilizing available\ua0resources. Using extensive evaluations, we show the efficiency and achievements\ua0of the proposed techniques for IoT representative applications that involve a\ua0wide spectrum of platforms, and illustrate that the performance of our work\ua0exceeds that of state-of-the-art techniques
    corecore