14 research outputs found

    Introduction to Multiprocessor I/O Architecture

    Get PDF
    The computational performance of multiprocessors continues to improve by leaps and bounds, fueled in part by rapid improvements in processor and interconnection technology. I/O performance thus becomes ever more critical, to avoid becoming the bottleneck of system performance. In this paper we provide an introduction to I/O architectural issues in multiprocessors, with a focus on disk subsystems. While we discuss examples from actual architectures and provide pointers to interesting research in the literature, we do not attempt to provide a comprehensive survey. We concentrate on a study of the architectural design issues, and the effects of different design alternatives

    A Data Mesh Implementation

    Get PDF
    I dati sono diventati centrali per l’economia di tutte le aziende, grazie a strumenti BI di sempre più facile utilizzo e ai vantaggi forniti dai modelli di ML. Le organizzazioni che operano in più settori, tuttavia, devono affrontare il problema di gestire ed estrarre valore da una quantità enorme e diversificata di dati. L’architettura Data Mesh, teorizzata da Zhamak Dehghani nel 2018, sta riscuotendo molto interesse perché promette una soluzione a questi problemi favorendo la decentralizzazione dei dati, anziché ostacolarla come è sempre avvenuto nelle strutture tradizionali di gestione dei dati. Il lavoro svolto per questa tesi è suddiviso in due parti: una parte teorica sul Data Mesh, perché è necessario, da cosa ha preso origine e i suoi principi; una pratica, svolta nell’ambito di un progetto interno all’azienda Bip, che ha visto l’implementazione da zero di un’architettura Data Mesh cross-cloud, utilizzando vari strumenti open-source. Il mio contributo si è concentrato sull’area specifica di Data Consumption, implementnado alcuni strumenti – quali GCP Dataplex, Trino e Apache Superset – attraverso i linguaggi Terraform e YAML. La tesi si conclude con un confronto dell’architettura creata in Bip con la teoria e i servizi offerti da altre grandi aziende, come Google e Microsoft

    Data Governance in Data Mesh Infrastructures: The Saxo Bank Case Study

    Get PDF
    Data governance (DG) is the management of data in a manner that the value of data is maximised and data related risks are minimised. Three aspects of DG are data catalogue, data quality, and data ownership and these aim to provide transparency, foster trust, and manage access and control the data. DG solution involves change management and alignment of incentives and mere technology is not enough to address this. In this paper we aim to provide a holistic view of data governance that is a synthesis of academic and practitioner viewpoints, and conclude by giving an example of a pilot case study (Saxo Bank) where authors worked on tech and cultural interventions to address the data governance challenges

    Data Assets: Tokenization and Valuation

    Get PDF
    Your Data (new gold, new oil) is hugely valuable (est. $13T globally) but not a 'balance-sheet' asset. Tokenization- used by banks for payments and settlement- lets you manage, value, and monetize your data. Data is the ultimate commodity industry. This position paper outlines our vision and a general framework for tokenizing data, managing data assets and data liquidity to allow individuals and organizations in the public and private sectors to gain the economic value of data, while facilitating its responsible and ethical use. We will examine the challenges associated with developing and securing a data economy, as well as the potential applications and opportunities of the decentralised data-tokenized economy. We will also discuss the ethical considerations to promote the responsible exchange and use of data to fuel innovation and progress

    LEAN DATA ENGINEERING. COMBINING STATE OF THE ART PRINCIPLES TO PROCESS DATA EFFICIENTLYS

    Get PDF
    The present work was developed during an internship, under Erasmus+ Traineeship program, in Fieldwork Robotics, a Cambridge based company that develops robots to operate in agricultural fields. They collect data from commercial greenhouses with sensors and real sense cameras, as well as with gripper cameras placed in the robotic arms. This data is recorded mainly in bag files, consisting of unstructured data, such as images and semi-structured data, such as metadata associated with both the conditions where the images were taken and information about the robot itself. Data was uploaded, extracted, cleaned and labelled manually before being used to train Artificial Intelligence (AI) algorithms to identify raspberries during the harvesting process. The amount of available data quickly escalates with every trip to the fields, which creates an ever-growing need for an automated process. This problem was addressed via the creation of a data engineering platform encom- passing a data lake, data warehouse and its needed processing capabilities. This platform was created following a series of principles entitled Lean Data Engineering Principles (LDEP), and the systems that follows them are called Lean Data Engineering Systems (LDES). These principles urge to start with the end in mind: process incoming batch or real-time data with no resource wasting, limiting the costs to the absolutely necessary for the job completion, in other words to be as lean as possible. The LDEP principles are a combination of state-of-the-art ideas stemming from several fields, such as data engineering, software engineering and DevOps, leveraging cloud technologies at its core. The proposed custom-made solution enabled the company to scale its data operations, being able to label images almost ten times faster while reducing over 99.9% of its associated costs in comparison to the previous process. In addition, the data lifecycle time has been reduced from weeks to hours while maintaining coherent data quality results, being able, for instance, to correctly identify 94% of the labels in comparison to a human counterpart.Este trabalho foi desenvolvido durante um estágio no âmbito do programa Erasmus+ Traineeship, na Fieldwork Robotics, uma empresa sediada em Cambridge que desenvolve robôs agrícolas. Estes robôs recolhem dados no terreno com sensores e câmeras real- sense, localizados na estrutura de alumínio e nos pulsos dos braços robóticos. Os dados recolhidos são ficheiros contendo dados não estruturados, tais como imagens, e dados semi- -estruturados, associados às condições em que as imagens foram recolhidas. Originalmente, o processo de tratamento dos dados recolhidos (upload, extração, limpeza e etiquetagem) era feito de forma manual, sendo depois utilizados para treinar algoritmos de Inteligência Artificial (IA) para identificar framboesas durante o processo de colheita. Como a quantidade de dados aumentava substancialmente com cada ida ao terreno, verificou-se uma necessidade crescente de um processo automatizado. Este problema foi endereçado com a criação de uma plataforma de engenharia de dados, composta por um data lake, uma data warehouse e o respetivo processamento, para movimentar os dados nas diferentes etapas do processo. Esta plataforma foi criada seguindo uma série de princípios intitulados Lean Data Engineering Principles (LDEP), sendo os sistemas que os seguem intitulados de Lean Data Engineering Systems (LDES). Estes princípios incitam a começar com o fim em mente: processar dados em batch ou em tempo real, sem desperdício de recursos, limitando os custos ao absolutamente necessário para a concluir o trabalho, ou seja, tornando-os o mais lean possível. Os LDEP combinam vertentes do estado da arte em diversas áreas, tais como engenharia de dados, engenharia de software, DevOps, tendo no seu cerne as tecnologias na cloud. O novo processo permitiu à empresa escalar as suas operações de dados, tornando-se capaz de etiquetar imagens quase 10× mais rápido e reduzindo em mais de 99,9% os custos associados, quando comparado com o processo anterior. Adicionalmente, o ciclo de vida dos dados foi reduzido de semanas para horas, mantendo uma qualidade equiparável, ao ser capaz de identificar corretamente 94% das etiquetas em comparação com um homólogo humano

    Agile Processes in Software Engineering and Extreme Programming

    Get PDF
    This open access book constitutes the proceedings of the 23rd International Conference on Agile Software Development, XP 2022, which was held in Copenhagen, Denmark, in June 2022. XP is the premier agile software development conference combining research and practice. It is a unique forum where agile researchers, practitioners, thought leaders, coaches, and trainers get together to present and discuss their most recent innovations, research results, experiences, concerns, challenges, and trends.  XP conferences provide an informal environment to learn and trigger discussions and welcome both people new to agile and seasoned agile practitioners. This year’s conference was held with the theme “Agile in the Era of Hybrid Work”. The 13 full papers and 1 short paper presented in this volume were carefully reviewed and selected from 40 submissions. They were organized in topical sections named: agile practices; agile processes; and agile in the large

    Data aggregation for multi-instance security management tools in telecommunication network

    Get PDF
    Communication Service Providers employ multiple instances of network monitoring tools within extensive networks that span large geographical regions, encompassing entire countries. By collecting monitoring data from various nodes and consolidating it in a central location, a comprehensive control dashboard is established, presenting an overall network status categorized under different perspectives. In order to achieve this centralized view, we evaluated three architectural options: polling data from individual nodes to a central node, asynchronous push of data from individual nodes to a central node, and a cloud-based Extract, Transform, Load (ETL) approach. Our analysis leads us to the conclusion that the third option is most suitable for the telecommunication system use case. Remarkably, we observed that the quantity of monitoring results is approximately 30 times greater than the total number of devices monitored within the network. Implementing the ETL-based approach, we achieved favorable performance times of 2.23 seconds, 7.16 seconds, and 27.96 seconds for small, medium, and large networks, respectively. Notably, the extraction operation required the most significant amount of time, followed by the load and processing phases. Furthermore, in terms of average memory consumption, the small, medium, and large networks necessitated 323.59 MB, 497.34 MB, and 1668.59 MB, respectively. It is worth noting that the relationship between the total number of devices in the system and both performance and memory consumption is linear in nature

    Providing verifiable oversight for scrutability, assurance and accountability in data-driven systems

    Get PDF
    The emergence of data-driven systems that inform decisions or offer recommendations impacts all sectors, including high-stakes settings where judgements affecting health,education and security are made. There is little visibility afforded into the qualities of the constituent components of these systems, or how they have been prepared and assembled. This makes it difficult for stakeholders to scrutinise systems and build confidence in system quality – which is important as problems resulting from poorly prepared or mismanaged data can have serious consequences. There is motivation to foster trustworthy systems, based on transparency and accountability, but there are currently shortcomings in tools that offer the desired scrutability onto data-driven systems, whilst protecting confidentiality requirements of providers. This thesis adopts a design research approach to address these shortcomings by designing and demonstrating information systems artefacts that enable providers to take accountability for their contributions to data-driven systems and provide verifiable assertions of the properties and qualities of systems and components to authorised parties. The outcomes are a framework to help identify parties that contribute to the provision of data-driven systems, and a conceptual model that adopts a bill of materials document to record system supply chains. These artefacts are employed in software architectures that provide verifiable assurance of the qualities of digital assets to authorised parties and offer scrutability on data-driven systems. The software architectures adopt decentralised data models and protocols based on self-sovereign identity paradigms to place accountability on providers of assets. This enables domain users and other stakeholders to seek assurance on the qualities of systems and assets, whilst protecting sensitive information from unauthorised access. This thesis contributes to the adoption of self-sovereign identity data models and protocols for parties to ratify qualities and take accountability for digital assets, extending their scope from the current dominant usage for personal identity information
    corecore