36 research outputs found

    Archival Information Package (AIP) Pilot Specification

    Get PDF
    This report presents the E-ARK AIP format specification as it will be used by the pilots (implementations in pilot organizations). The deliverable is a follow-up version of E-ARK deliverable D4.2. The report describes the structure, metadata, and physical container format of the E-ARK AIP, a container which is the result of converting an E-ARK Submission Information Package (SIP) into the E-ARK Archival Information Package (AIP). The conversion will be implemented in the Integrated Platform as part of the component earkweb

    The impact of microservices: an empirical analysis of the emerging software architecture

    Get PDF
    Dissertação de mestrado em Informatics EngineeringThe applications’ development paradigm has faced changes in recent years, with modern development being characterized by the need to continuously deliver new software iterations. With great affinity with those principles, microservices is a software architecture which features characteristics that potentially promote multiple quality attributes often required by modern, large-scale applications. Its recent growth in popularity and acceptance in the industry made this architectural style often described as a form of modernizing applications that allegedly solves all the traditional monolithic applications’ inconveniences. However, there are multiple worth mentioning costs associated with its adoption, which seem to be very vaguely described in existing empirical research, being often summarized as "the complexity of a distributed system". The adoption of microservices provides the agility to achieve its promised benefits, but to actually reach them, several key implementation principles have to be honored. Given that it is still a fairly recent approach to developing applications, the lack of established principles and knowledge from development teams results in the misjudgment of both costs and values of this architectural style. The outcome is often implementations that conflict with its promised benefits. In order to implement a microservices-based architecture that achieves its alleged benefits, there are multiple patterns and methodologies involved that add a considerable amount of complexity. To evaluate its impact in a concrete and empirical way, one same e-commerce platform was developed from scratch following a monolithic architectural style and two architectural patterns based on microservices, featuring distinct inter-service communication and data management mechanisms. The effort involved in dealing with eventual consistency, maintaining a communication infrastructure, and managing data in a distributed way portrayed significant overheads not existent in the development of traditional applications. Nonetheless, migrating from a monolithic architecture to a microservicesbased is currently accepted as the modern way of developing software and this ideology is not often contested, nor the involved technical challenges are appropriately emphasized. Sometimes considered over-engineering, other times necessary, this dissertation contributes with empirical data from insights that showcase the impact of the migration to microservices in several topics. From the trade-offs associated with the use of specific patterns, the development of the functionalities in a distributed way, and the processes to assure a variety of quality attributes, to performance benchmarks experiments and the use of observability techniques, the entire development process is described and constitutes the object of study of this dissertation.O paradigma de desenvolvimento de aplicações tem visto alterações nos últimos anos, sendo o desenvolvimento moderno caracterizado pela necessidade de entrega contínua de novas iterações de software. Com grande afinidade com esses princípios, microsserviços são uma arquitetura de software que conta com características que potencialmente promovem múltiplos atributos de qualidade frequentemente requisitados por aplicações modernas de grandes dimensões. O seu recente crescimento em popularidade e aceitação na industria fez com que este estilo arquitetural se comumente descrito como uma forma de modernizar aplicações que alegadamente resolve todos os inconvenientes apresentados por aplicações monolíticas tradicionais. Contudo, existem vários custos associados à sua adoção, aparentemente descritos de forma muito vaga, frequentemente sumarizados como a "complexidade de um sistema distribuído". A adoção de microsserviços fornece a agilidade para atingir os seus benefícios prometidos, mas para os alcançar, vários princípios de implementação devem ser honrados. Dado que ainda se trata de uma forma recente de desenvolver aplicações, a falta de princípios estabelecidos e conhecimento por parte das equipas de desenvolvimento resulta em julgamentos errados dos custos e valores deste estilo arquitetural. O resultado geralmente são implementações que entram em conflito com os seus benefícios prometidos. De modo a implementar uma arquitetura baseada em microsserviços com os benefícios prometidos existem múltiplos padrões que adicionam considerável complexidade. De modo a avaliar o impacto dos microsserviços de forma concreta e empírica, foi desenvolvida uma mesma plataforma e-commerce de raiz segundo uma arquitetura monolítica e duas arquitetura baseadas em microsserviços, contando com diferentes mecanismos de comunicação entre os serviços. O esforço envolvido em lidar com consistência eventual, manter a infraestrutura de comunicação e gerir os dados de uma forma distribuída representaram desafios não existentes no desenvolvimento de aplicações tradicionais. Apesar disso, a ideologia de migração de uma arquitetura monolítica para uma baseada em microsserviços é atualmente aceite como a forma moderna de desenvolver aplicações, não sendo frequentemente contestada nem os seus desafios técnicos são apropriadamente enfatizados. Por vezes considerado overengineering, outras vezes necessário, a presente dissertação visa contribuir com dados práticos relativamente ao impacto da migração para arquiteturas baseadas em microsserviços em diversos tópicos. Desde os trade-offs envolvidos no uso de padrões específicos, o desenvolvimento das funcionalidades de uma forma distribuída e nos processos para assegurar uma variedade de atributos de qualidade, até análise de benchmarks de performance e uso de técnicas de observabilidade, todo o desenvolvimento é descrito e constitui o objeto de estudo da dissertação

    Referential Integrity in Cloud NoSQL Databases

    No full text
    Cloud computing delivers on-demand access to essential computing services providing benefits such as reduced maintenance, lower costs, global access, and others. One of its important and prominent services is Database as a Service (DaaS) which includes cloud Database Management Systems (DBMSs). Cloud DBMSs commonly adopt the key-value data model and are called Not only SQL (NoSQL) DBMSs. These provide cloud suitable features like scalability, flexibility and robustness, but in order to provide these, features such as referential integrity are often sacrificed. In such cases, referential integrity is left to be dealt with by the applications instead of being handled by the cloud DBMSs. Thus, applications are required to either deal with inconsistency in the data (e.g. dangling references) or to incorporate the necessary logic to ensure that referential integrity is maintained. This thesis presents an Application Programming Interface (API) that serves as a middle layer between the applications and the cloud DBMS in order to maintain referential integrity. The API provides the necessary Create, Read, Update and Delete (CRUD) operations to be performed on the DBMS while ensuring that the referential integrity constraints are satisfied. These constraints are represented as metadata and four different approaches are provided to store it. Furthermore, the performance of these approaches is measured with different referential integrity constraints and evaluated upon a set of experiments in Apache Cassandra, a prominent cloud NoSQL DBMS. The results showed significant differences between the approaches in terms of performance. However, the final word on which one is better depends on the application demands as each approach presents different trade-offs

    Analysis of the Impact of Data Normalization on Cyber Event Correlation Query Performance

    Get PDF
    A critical capability required in the operation of cyberspace is the ability to maintain situational awareness of the status of the infrastructure elements that constitute cyberspace. Event logs from cyber devices can yield significant information, and when properly utilized they can provide timely situational awareness about the state of the cyber infrastructure. In addition, proper Information Assurance requires the validation and verification of the integrity of results generated by a commercial log analysis tool. Event log analysis can be performed using relational databases. To enhance database query performance, previous literatures affirm denormalization of databases. Yet database normalization can also increase query performance. Database normalization improved the majority of the queries performed using very large data sets of router events. In addition, queries performed faster on normalized tables when all the necessary data were contained in the normalized tables. Database normalization improves table organization and maintains better data consistency than a lack of normalization. Nonetheless, there are some tradeoffs when normalizing a database, such as additional preprocessing time and extra storage requirements. But overall, normalization improved query performance and must be considered an option when analyzing event logs using relational databases. There are three primary research questions addressed in this thesis: (1) What standards exist for the generation, transport, storage, and analysis of event log data for security analysis?; (2) How does database normalization impact query performance when using very large data sets (over 30 million) of router events?; and (3) What are the tradeoffs between using a normalized versus non-normalized database in terms of preprocessing time, query performance, storage requirements, and database consistency

    The Modernization Process of a Data Pipeline

    Get PDF
    Data plays an integral part in a company’s decision-making. Therefore, decision-makers must have the right data available at the right time. Data volumes grow constantly, and new data is continuously needed for analytical purposes. Many companies use data warehouses to store data in an easy-to-use format for reporting and analytics. The challenge with data warehousing is displaying data using one unified structure. The source data is often gathered from many systems that are structured in various ways. A process called extract, transform, and load (ETL) or extract, load, and transform (ELT) is used to load data into the data warehouse. This thesis describes the modernization process of one such pipeline. The previous solution, which used an on-premises Teradata platform for computation and SQL stored procedures for the transformation logic, is replaced by a new solution. The goal of the new solution is a process that uses modern tools, is scalable, and follows programming best practises. The cloud-based Databricks platform is used for computation, and dbt is used as the transformation tool. Lastly, a comparison is made between the new and old solutions, and their benefits and drawbacks are discussed

    Coastal Biophysical Inventory Database for the Point Reyes National Seashore

    Get PDF
    The Coastal Biophysical Inventory Database is the repository of the data gathered from a rapid assessment of approximately 161 km of the intertidal habitat managed by the Point Reyes National Seashore and Golden Gate National Recreation Area. The Coastal Biophysical Inventory Database is modeled after the “Alaska Coastal Resources Inventory and Mapping Database” and CoastWalker program of Glacier Bay National Park and Preserve. The protocol and database were adapted for this effort to represent the features of the Point Reyes National Seashore and Golden Gate National Recreation Area located along the northern central coast of California. The database is an integration of spatial data and observation data entered and browsed through an interface designed to complement the methods of the observation protocol. The Coastal Biophysical Inventory (CBI) and Mapping Protocol is the methodology to collect and store repeatable observations of the intertidal zone to create a baseline of information useful for resource management and potentially assist damage assessment in the event of an oil spill. The inventory contributes to the knowledge needed for the conservation of coastal resources managed in the public’s trust. The Coastal Biophysical Inventory Database is a Microsoft Access 2003 format relational database with a customized data entry interface programmed in Microsoft Access Visual Basic for Applications. The interface facilitates the entry, storage and relation of substrate, biology, photographs, and other field observations. Data can be browsed or queried using query tools common to the Microsoft Access software or using custom spatial query tools built into the interface with ESRI MapObjects LT 2.0 ActiveX COM objects. The Coastal Biophysical Inventory’s GIS data set is useful for collecting, analyzing and reporting field observations about the intertidal zone. The GIS data set is linked to the observation data set through a unique number, the Segment ID, by using the relate tools found in ArcGIS (9.2-10). The Segment ID is a non-repeating number that references a section of coastline that is delineated by the type and form of the substrate observed. The Segment ID allows connection to the biological observations and other observation records such as photos or the original data sheets. Through ArcGIS connections to the observation database using the Segment ID, summaries of biodiversity or habitat can be made by location. The Coastal Biophysical Inventory has completed its initial goals to assess the coastline of two National Parks. The data set collected provides a snapshot of information and the database allows for future observations to be recorded. It provides coastal resource managers a broad insight and orientation to the intertidal resources managed by the National Park Service

    OPC UA Java History Gateway with Inherent Database Integration

    Get PDF
    OPC Unified Automation on automaatioteollisuudessa käytetty määrittely tiedon mallintamiseen ja hallintaan. Määrittely kuvaa mm. kommunikoinnin, tiedon mallintamisen ja turvallisen tiedonsiirron automaatiolaitteiden välillä. Tiedonsiirto tapahtuu palvelimien ja näihin yhteydessä olevien asiakassovellusten välillä. Työssä toteutetaan eräänlainen välityspalvelin (gateway) tiedon kokoamiseen. Välityspalvelin on yhteydessä useisiin muihin palvelimiin, joiden tietoja voi käsitellä sen kautta. Työn toinen tärkeä osa-alue on prosessitiedon tallentaminen SQL tietokantaan. Tähän tarkoitukseen työssä esitellään tietokannan rakenne, joka mahdollistaa OPC UA:lla mallinnetun tiedon tallentamisen (ja palauttamisen). Työssä myös toteutetaan prosessitiedon tallennus ja luku tietokannasta Javaa ja OPC UA:ta käyttäen. Samalla ratkaistaan osittainen OPC UA tietomallin esittäminen SQL tietokannassa. Työn ratkaisuna on prototyyppi OPC UA historiavälityspalvelimesta. Palvelin kokoaa useiden OPC UA palvelimien tietoja yhteen palvelimeen ja kykenee näin yhdistämään laitetietoja laajalta alalta. Palvelin myös tarjoaa aikasarjoja (pysyvästi) tallennetusta prosessidatasta, mikä mahdollistaa kehittyneemmän tiedon analysoinnin.OPC Unified Automation is a highly-developed information modelling and managing framework in use in the automation industry. OPC UA takes into account the communication, data modelling and security aspects w.r.t information exchange between devices in the factory floor. In practice the information exchange is done between server and client instances. Of these, servers hold the process data, and clients access it. An intermediate gateway is developed, which accesses (and allows managing) several other servers from a single instance. The storing of process data within the OPC UA framework is another main topic of this thesis. The thesis presents an SQL data model to storing time series process data acquired from multiple servers. The data itself is modelled using the OPC UA semantics. Additionally, the connectivity and data mapping to few SQL implementations is solved. A solution addressing both the storing and integration aspects is introduced in the form of the OPC UA History Gateway. The OPC UA History Gateway illustrates capabilities of the OPC UA framework in the data acquisition and device integration in the modern automation environment. The implemented (prototype) solution is shown to aggregate and store plant floor device information. The OPC UA History Gateway also provides trend data to clients, making more refined data analysis possible
    corecore