551 research outputs found

    Enhancing cyber assets visibility for effective attack surface management : Cyber Asset Attack Surface Management based on Knowledge Graph

    Get PDF
    The contemporary digital landscape is filled with challenges, chief among them being the management and security of cyber assets, including the ever-growing shadow IT. The evolving nature of the technology landscape has resulted in an expansive system of solutions, making it challenging to select and deploy compatible solutions in a structured manner. This thesis explores the critical role of Cyber Asset Attack Surface Management (CAASM) technologies in managing cyber attack surfaces, focusing on the open-source CAASM tool, Starbase, by JupiterOne. It starts by underlining the importance of comprehending the cyber assets that need defending. It acknowledges the Cyber Defense Matrix as a methodical and flexible approach to understanding and addressing cyber security challenges. A comprehensive analysis of market trends and business needs validated the necessity of asset security management tools as fundamental components in firms' security journeys. CAASM has been selected as a promising solution among various tools due to its capabilities, ease of use, and seamless integration with cloud environments using APIs, addressing shadow IT challenges. A practical use case involving the integration of Starbase with GitHub was developed to demonstrate the CAASM's usability and flexibility in managing cyber assets in organizations of varying sizes. The use case enhanced the knowledge graph's aesthetics and usability using Neo4j Desktop and Neo4j Bloom, making it accessible and insightful even for non-technical users. The thesis concludes with practical guidelines in the appendices and on GitHub for reproducing the use case

    Knowledge-Driven Harmonization of Sensor Observations: Exploiting Linked Open Data for IoT Data Streams

    Get PDF
    The rise of the Internet of Things leads to an unprecedented number of continuous sensor observations that are available as IoT data streams. Harmonization of such observations is a labor-intensive task due to heterogeneity in format, syntax, and semantics. We aim to reduce the effort for such harmonization tasks by employing a knowledge-driven approach. To this end, we pursue the idea of exploiting the large body of formalized public knowledge represented as statements in Linked Open Data

    Blueprint model and language for engineering cloud applications

    Get PDF
    Abstract: The research presented in this thesis is positioned within the domain of engineering CSBAs. Its contribution is twofold: (1) a uniform specification language, called the Blueprint Specification Language (BSL), for specifying cloud services across several cloud vendors and (2) a set of associated techniques, called the Blueprint Manipulation Techniques (BMTs), for publishing, querying, and composing cloud service specifications with aim to support the flexible design and configuration of an CSBA.

    Service Abstractions for Scalable Deep Learning Inference at the Edge

    Get PDF
    Deep learning driven intelligent edge has already become a reality, where millions of mobile, wearable, and IoT devices analyze real-time data and transform those into actionable insights on-device. Typical approaches for optimizing deep learning inference mostly focus on accelerating the execution of individual inference tasks, without considering the contextual correlation unique to edge environments and the statistical nature of learning-based computation. Specifically, they treat inference workloads as individual black boxes and apply canonical system optimization techniques, developed over the last few decades, to handle them as yet another type of computation-intensive applications. As a result, deep learning inference on edge devices still face the ever increasing challenges of customization to edge device heterogeneity, fuzzy computation redundancy between inference tasks, and end-to-end deployment at scale. In this thesis, we propose the first framework that automates and scales the end-to-end process of deploying efficient deep learning inference from the cloud to heterogeneous edge devices. The framework consists of a series of service abstractions that handle DNN model tailoring, model indexing and query, and computation reuse for runtime inference respectively. Together, these services bridge the gap between deep learning training and inference, eliminate computation redundancy during inference execution, and further lower the barrier for deep learning algorithm and system co-optimization. To build efficient and scalable services, we take a unique algorithmic approach of harnessing the semantic correlation between the learning-based computation. Rather than viewing individual tasks as isolated black boxes, we optimize them collectively in a white box approach, proposing primitives to formulate the semantics of the deep learning workloads, algorithms to assess their hidden correlation (in terms of the input data, the neural network models, and the deployment trials) and merge common processing steps to minimize redundancy

    Provenance-aware knowledge representation: A survey of data models and contextualized knowledge graphs

    Get PDF
    Expressing machine-interpretable statements in the form of subject-predicate-object triples is a well-established practice for capturing semantics of structured data. However, the standard used for representing these triples, RDF, inherently lacks the mechanism to attach provenance data, which would be crucial to make automatically generated and/or processed data authoritative. This paper is a critical review of data models, annotation frameworks, knowledge organization systems, serialization syntaxes, and algebras that enable provenance-aware RDF statements. The various approaches are assessed in terms of standard compliance, formal semantics, tuple type, vocabulary term usage, blank nodes, provenance granularity, and scalability. This can be used to advance existing solutions and help implementers to select the most suitable approach (or a combination of approaches) for their applications. Moreover, the analysis of the mechanisms and their limitations highlighted in this paper can serve as the basis for novel approaches in RDF-powered applications with increasing provenance needs

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

    Monitoring and analysis system for performance troubleshooting in data centers

    Get PDF
    It was not long ago. On Christmas Eve 2012, a war of troubleshooting began in Amazon data centers. It started at 12:24 PM, with an mistaken deletion of the state data of Amazon Elastic Load Balancing Service (ELB for short), which was not realized at that time. The mistake first led to a local issue that a small number of ELB service APIs were affected. In about six minutes, it evolved into a critical one that EC2 customers were significantly affected. One example was that Netflix, which was using hundreds of Amazon ELB services, was experiencing an extensive streaming service outage when many customers could not watch TV shows or movies on Christmas Eve. It took Amazon engineers 5 hours 42 minutes to find the root cause, the mistaken deletion, and another 15 hours and 32 minutes to fully recover the ELB service. The war ended at 8:15 AM the next day and brought the performance troubleshooting in data centers to world’s attention. As shown in this Amazon ELB case.Troubleshooting runtime performance issues is crucial in time-sensitive multi-tier cloud services because of their stringent end-to-end timing requirements, but it is also notoriously difficult and time consuming. To address the troubleshooting challenge, this dissertation proposes VScope, a flexible monitoring and analysis system for online troubleshooting in data centers. VScope provides primitive operations which data center operators can use to troubleshoot various performance issues. Each operation is essentially a series of monitoring and analysis functions executed on an overlay network. We design a novel software architecture for VScope so that the overlay networks can be generated, executed and terminated automatically, on-demand. From the troubleshooting side, we design novel anomaly detection algorithms and implement them in VScope. By running anomaly detection algorithms in VScope, data center operators are notified when performance anomalies happen. We also design a graph-based guidance approach, called VFocus, which tracks the interactions among hardware and software components in data centers. VFocus provides primitive operations by which operators can analyze the interactions to find out which components are relevant to the performance issue. VScope’s capabilities and performance are evaluated on a testbed with over 1000 virtual machines (VMs). Experimental results show that the VScope runtime negligibly perturbs system and application performance, and requires mere seconds to deploy monitoring and analytics functions on over 1000 nodes. This demonstrates VScope’s ability to support fast operation and online queries against a comprehensive set of application to system/platform level metrics, and a variety of representative analytics functions. When supporting algorithms with high computation complexity, VScope serves as a ‘thin layer’ that occupies no more than 5% of their total latency. Further, by using VFocus, VScope can locate problematic VMs that cannot be found via solely application-level monitoring, and in one of the use cases explored in the dissertation, it operates with levels of perturbation of over 400% less than what is seen for brute-force and most sampling-based approaches. We also validate VFocus with real-world data center traces. The experimental results show that VFocus has troubleshooting accuracy of 83% on average.Ph.D

    Efficient Management for Geospatial and Temporal Data using Ontology-based Data Access Techniques

    Get PDF
    Το μοντέλο δεδομένων RDF και η γλώσσα επερωτήσεων SPARQL είναι ευρέως διαδεδομένα για την χρήση τους με σκοπό την ενοποίηση πληροφορίας που προέρχεται από διαφορετικές πηγές. Ο αυξανόμενος αριθμός των γεωχωρικών συνόλων δεδομένων που είναι πλέον διαθέσιμα σαν γεωχωρικά διασυνδεδεμένα δεδομένα οδήγησε στην εμφάνιση επεκτάσεων του μοντέλου δεδομένων RDF και της γλώσσας επερωτήσεων SPARQL. Δύο από τις σημαντικότερες επεκτάσεις αυτές είναι η γλώσσα GeoSPARQL, η οποία έγινε OGC πρότυπο, και το πλαίσιο του μοντέλου δεδομένων stRDF και της γλώσσας επερωτήσεων stSPARQL. Και οι δύο προσεγγίσεις μπορούν να χρησιμοποιηθούν για την αναπαράσταση και επερώτηση διασυνδεδεμένων γεωχωρικών δεδομένων, ενώ το μοντέλο stRDF και η γλώσσα stSPARQL παρέχουν επίσης επιπλέον λειτουργικότητα για την αναπαράσταση και επερώτηση χρονικών δεδομένων. Παρότι ο αριθμός των δεδομένων που είναι διαθέσιμα σαν γεωχωρικά ή και χρονικά διασυνδεδεμένα δεδομένα αυξάνεται, η μετατροπή των γεωχωρικών δεδομένων σε RDF και η αποθήκευσή τους σε αποθετήρια RDF δεν είναι πάντα η βέλτιστη λύση, ειδικά όταν τα δεδομένα βρίσκονται εξαρχής σε σχεσιακές βάσεις οι οποίες μπορεί να έχουν αρκετά μεγάλο μέγεθος ή και να ενημερώνονται πολύ συχνά. Στα πλαίσια αυτής της διδακτορικής διατριβής, προτείνουμε μια λύση βασισμένη στην ανάκτηση πληροφορίας με χρήση οντολογιών και αντιστοιχίσεων για την επερώτηση δεδομένων πάνω από γεωχωρικές σχεσιακές βάσεις δεδομένων. Επεκτείνουμε τεχνικές επανεγγραφής GeoSPARQL ερωτημάτων σε SQL ώστε η αποτίμηση των επερωτήσεων να γίνεται εξολοκλήρου στο γεωχωρικό σύστημα διαχείρισης βάσεων δεδομένων. Επίσης, εισαγάγουμε επιπλέον λειτουργικότητα στη χρονική συνιστώσα του μοντέλου δεδομένων stRDF και της γλώσσας επερωτήσεων stSPARQL, προκειμένου να διευκολυνθεί η υποστήριξη χρονικών τελεστών σε συστήματα OBDA. Στη συνέχεια, επεκτείνουμε τις παραπάνω μεθόδους με την υποστήριξη διαφορετικών πηγών δεδομένων πέρα από σχεσιακές βάσεις και παρουσιάζουμε μια OBDA προσέγγιση που επιτρέπει τη δημιουργία εικονικών RDF γράφων πάνω από δεδομένα που βρίσκονται διαθέσιμα στο διαδίκτυο σε διάφορες μορφές (πχ. HTML πίνακες, web διεπαφές), με χρήση οντολογιών και αντιστοιχίσεων. Συγκρίναμε την απόδοση του συστήματός μας με ένα σχετικό σύστημα και τα αποτελέσματα έδειξαν ότι πέραν του ότι το σύστημά μας παρέχει μεγαλύτερη λειτουργικότητα (πχ. υποστηρίζει περισσότερα είδη πηγών δεδομένων, περιλαμβάνει απλούστερες διαδικασίες και εξασφαλίζει καλύτερη απόδοση. Τέλος, παρουσιάζουμε την εφαρμογή των μεθόδων και συστημάτων που περιγράφονται στη διατριβή σε πραγματικά σενάρια χρήσης.The data model RDF and query language SPARQL have been widely used for the integration of data coming from different souces. Due to the increasing number of geospatial datasets that are being available as linked open data, a lot of effort focuses in the development of geospatial (and temporal, accordingly) extensions of the framework of RDF and SPARQL. Two highlights of these efforts are the query language GeoSPARQL, that is an OGC standard, and the framework of stRDF and stSPARQL. Both frameworks can be used for the representation and querying of linked geospatial data, and stSPARQL also includes a temporal dimension. Although a lot of geospatial (and some temporal) RDF stores started to emerge, converting geospatial data into RDF and then storing it into an RDF stores is not always best practice, especially when the data exists in a relational database that is fairly large and/or it gets updated frequently. In this thesis, we propose an Ontology-based Data Access (OBDA) approach for accessing geospatial data stored in geospatial relational databases, using the OGC standard GeoSPARQL and R2RML or OBDA mappings. We introduce extensions to an existing SPARQL-to-SQL translation method to support GeoSPARQL features. We describe the implementation of our approach in the system Ontop-spatial, an extension of the OBDA system Ontop for creating virtual geospatial RDF graphs on top of geospatial relational databases. Ontop-spatial is the first geospatial OBDA system and outperforms state-of-the-art geospatial RDF stores. We also show how to answer queries with temproal operators in the OBDA framework, by utilizing the framework stRDF and the query language stSPARQL which we extend with some new features. Next, we extend the data sources supported by Ontop-spatial going beyond relational database management systems, and we present our OBDA solutions for creating virtual RDF graphs on top of various web data sources (e.g., HTML tables, Web APIs) using ontologies and mappings. We compared the performance of our approach with a related implementation and evaluation results showed that not only does Ontop-spatial support more functionalities (e.g., more data sources, more simple workflow), but it also achieves better performance. Last, we describe how the work described in this thesis is applied in real-world application scenarios
    corecore