219 research outputs found

    Benchmarking Big Data SQL Frameworks

    Get PDF

    Wiki-health: from quantified self to self-understanding

    Get PDF
    Today, healthcare providers are experiencing explosive growth in data, and medical imaging represents a significant portion of that data. Meanwhile, the pervasive use of mobile phones and the rising adoption of sensing devices, enabling people to collect data independently at any time or place is leading to a torrent of sensor data. The scale and richness of the sensor data currently being collected and analysed is rapidly growing. The key challenges that we will be facing are how to effectively manage and make use of this abundance of easily-generated and diverse health data. This thesis investigates the challenges posed by the explosive growth of available healthcare data and proposes a number of potential solutions to the problem. As a result, a big data service platform, named Wiki-Health, is presented to provide a unified solution for collecting, storing, tagging, retrieving, searching and analysing personal health sensor data. Additionally, it allows users to reuse and remix data, along with analysis results and analysis models, to make health-related knowledge discovery more available to individual users on a massive scale. To tackle the challenge of efficiently managing the high volume and diversity of big data, Wiki-Health introduces a hybrid data storage approach capable of storing structured, semi-structured and unstructured sensor data and sensor metadata separately. A multi-tier cloud storage system—CACSS has been developed and serves as a component for the Wiki-Health platform, allowing it to manage the storage of unstructured data and semi-structured data, such as medical imaging files. CACSS has enabled comprehensive features such as global data de-duplication, performance-awareness and data caching services. The design of such a hybrid approach allows Wiki-Health to potentially handle heterogeneous formats of sensor data. To evaluate the proposed approach, we have developed an ECG-based health monitoring service and a virtual sensing service on top of the Wiki-Health platform. The two services demonstrate the feasibility and potential of using the Wiki-Health framework to enable better utilisation and comprehension of the vast amounts of sensor data available from different sources, and both show significant potential for real-world applications.Open Acces

    An Industry-Based Study on the Efficiency Benefits of Utilising Public Cloud Infrastructure and Infrastructure as Code Tools in the IT Environment Creation Process

    Get PDF
    The traditional approaches to IT infrastructure management typically involve the procuring, housing and running of company-owned and maintained physical servers. In recent years, alternative solutions to IT infrastructure management based on public cloud technologies have emerged. Infrastructure as a Service (IaaS), also known as public cloud infrastructure, allows for the on-demand provisioning of IT infrastructure resources via the Internet. Cloud Service Providers (CSP) such as Amazon Web Services (AWS) offer integration of their cloud-based infrastructure with Infrastructure as Code (IaC) tools. These tools allow for the entire configuration of public cloud based infrastructure to be scripted out and defined as code. This thesis hypothesises that the correct utilization of IaaS and IaC can offer an organisation a more efficient type of IT infrastructure creation system than that of the organisations traditional method. To investigate this claim, an industry-based case study and survey questionnaire were carried out as part of this body of work. The case study involved the replacement of a manually managed IT infrastructure with that of the public cloud, the creation of which was automated via a framework consisting of IaC and related automation tools. The survey questionnaire was created with the intent to corroborate or refute the results obtained in the case study in the context of a wider audience of organisations. The results show that the correct utilization of IaaS and IaC technologies can provide greater efficiency in the management of IT networks than the traditional approac

    Self-Reliance for the Internet of Things: Blockchains and Deep Learning on Low-Power IoT Devices

    Get PDF
    The rise of the Internet of Things (IoT) has transformed common embedded devices from isolated objects to interconnected devices, allowing multiple applications for smart cities, smart logistics, and digital health, to name but a few. These Internet-enabled embedded devices have sensors and actuators interacting in the real world. The IoT interactions produce an enormous amount of data typically stored on cloud services due to the resource limitations of IoT devices. These limitations have made IoT applications highly dependent on cloud services. However, cloud services face several challenges, especially in terms of communication, energy, scalability, and transparency regarding their information storage. In this thesis, we study how to enable the next generation of IoT systems with transaction automation and machine learning capabilities with a reduced reliance on cloud communication. To achieve this, we look into architectures and algorithms for data provenance, automation, and machine learning that are conventionally running on powerful high-end devices. We redesign and tailor these architectures and algorithms to low-power IoT, balancing the computational, energy, and memory requirements.The thesis is divided into three parts:Part I presents an overview of the thesis and states four research questions addressed in later chapters.Part II investigates and demonstrates the feasibility of data provenance and transaction automation with blockchains and smart contracts on IoT devices.Part III investigates and demonstrates the feasibility of deep learning on low-power IoT devices.We provide experimental results for all high-level proposed architectures and methods. Our results show that algorithms of high-end cloud nodes can be tailored to IoT devices, and we quantify the main trade-offs in terms of memory, computation, and energy consumption

    PoRt : Non-Interactive Continuous Availability Proof of Replicated Storage

    Get PDF
    Secure cryptographic storage is one of the most important issues that both businesses and end-users take into account before moving their data to either centralized clouds or blockchain-based decentralized storage marketplace. Recent work [4] formalizes the notion of Proof of Storage-Time (PoSt) which enables storage servers to demonstrate non-interactive continuous availability of outsourced data in a publicly verifiable way. The work also proposes a stateful compact PoSt construction, while leaving the stateless and transparent PoSt with support for proof of replication as an open problem. In this paper, we consider this problem by constructing a proof system that enables servers to simultaneously demonstrate continuous availability and dedication of unique storage resources for encoded replicas of a data file in a stateless and publicly verifiable way. We first formalize Proof of Replication-Time (PoRt) by extending PoSt formal definition and security model to provide support for replications. Then, we provide a concrete instantiation of PoRt by designing a lightweight replica encoding algorithm where replicas' failures are efficiently located through an efficient comparison-based verification process, after the data deposit period ends. PoRt's proofs are aggregatable: the prover can take several sequentially generated proofs and efficiently aggregate them into a single, succinct proof. The protocol is also stateless in the sense that the client can efficiently extend the deposit period by incrementally updating the tags and without requiring to download the outsourced file replicas. We also demonstrate feasible extensions of PoRt to support dynamic data updates, and be transparent to enable its direct use in decentralized storage networks, a property not supported in previous proposals. Finally, PoRt's verification cost is independent of both outsourced file size and deposit length.Peer reviewe

    PoRt: Non-Interactive Continuous Availability Proof of Replicated Storage

    Get PDF
    Secure cryptographic storage is one of the most important issues that both businesses and end-users take into account before moving their data to either centralized clouds or blockchain-based decen- tralized storage marketplace. Recent work [4 ] formalizes the notion of Proof of Storage-Time (PoSt) which enables storage servers to demonstrate non-interactive continuous availability of outsourced data in a publicly verifiable way. The work also proposes a stateful compact PoSt construction, while leaving the stateless and transpar- ent PoSt with support for proof of replication as an open problem. In this paper, we consider this problem by constructing a proof system that enables a server to simultaneously demonstrate con- tinuous availability and dedication of unique storage resources for encoded replicas of a data file in a stateless and publicly verifi- able way. We first formalize Proof of Replication-Time (PoRt) by extending PoSt formal definition and security model to provide support for replications. Then, we provide a concrete instantia- tion of PoRt by designing a lightweight replica encoding algorithm where replicas’ failures are efficiently located through an efficient comparison-based verification process, after the data deposit period ends. PoRt’s proofs are aggregatable: the prover can take several sequentially generated proofs and efficiently aggregate them into a single, succinct proof. The protocol is also stateless in the sense that the client can efficiently extend the deposit period by incre- mentally updating the tags and without requiring to download the outsourced file replicas. We also demonstrate feasible extensions of PoRt to support dynamic data updates, and be transparent to enable its direct use in decentralized storage networks, a property not supported in previous proposals. Finally, PoRt’s verification cost is independent of both outsourced file size and deposit length

    Leveraging Machine Learning Techniques towards Intelligent Networking Automation

    Get PDF
    In this thesis, we address some of the challenges that the Intelligent Networking Automation (INA) paradigm poses. Our goal is to design schemes leveraging Machine Learning (ML) techniques to cope with situations that involve hard decision-making actions. The proposed solutions are data-driven and consist of an agent that operates at network elements such as routers, switches, or network servers. The data are gathered from realistic scenarios, either actual network deployments or emulated environments. To evaluate the enhancements that the designed schemes provide, we compare our solutions to non-intelligent ones. Additionally, we assess the trade-off between the obtained improvements and the computational costs of implementing the proposed mechanisms. Accordingly, this thesis tackles the challenges that four specific research problems present. The first topic addresses the problem of balancing traffic in dense Internet of Things (IoT) network scenarios where the end devices and the Base Stations (BSs) form complex networks. By applying ML techniques to discover patterns in the association between the end devices and the BSs, the proposed scheme can balance the traffic load in a IoT network to increase the packet delivery ratio and reduce the energy cost of data delivery. The second research topic proposes an intelligent congestion control for internet connections at edge network elements. The design includes a congestion predictor based on an Artificial Neural Network (ANN) and an Active Queue Management (AQM) parameter tuner. Similarly, the third research topic includes an intelligent solution to the inter-domain congestion. Different from second topic, this problem considers the preservation of the private network data by means of Federated Learning (FL), since network elements of several organizations participate in the intelligent process. Finally, the fourth research topic refers to a framework to efficiently gathering network telemetry (NT) data. The proposed solution considers a traffic-aware approach so that the NT is intelligently collected and transmitted by the network elements. All the proposed schemes are evaluated through use cases considering standardized networking mechanisms. Therefore, we envision that the solutions of these specific problems encompass a set of methods that can be utilized in real-world scenarios towards the realization of the INA paradigm

    Consensus protocols exploiting network programmability

    Get PDF
    Services rely on replication mechanisms to be available at all time. The service demanding high availability is replicated on a set of machines called replicas. To maintain the consistency of replicas, a consensus protocol such as Paxos or Raft is used to synchronize the replicas' state. As a result, failures of a minority of replicas will not affect the service as other non-faulty replicas continue serving requests. A consensus protocol is a procedure to achieve an agreement among processors in a distributed system involving unreliable processors. Unfortunately, achieving such an agreement involves extra processing on every request, imposing a substantial performance degradation. Consequently, performance has long been a concern for consensus protocols. Although many efforts have been made to improve consensus performance, it continues to be an important problem for researchers. This dissertation presents a novel approach to improving consensus performance. Essentially, it exploits the programmability of a new breed of network devices to accelerate consensus protocols that traditionally run on commodity servers. The benefits of using programmable network devices to run consensus protocols are twofold: The network switches process packets faster than commodity servers and consensus messages travel fewer hops in the network. It means that the system throughput is increased and the latency of requests is reduced. The evaluation of our network-accelerated consensus approach shows promising results. Individual components of our FPGA- based and switch-based consensus implementations can process 10 million and 2.5 billion consensus messages per second, respectively. Our FPGA-based system as a whole delivers 4.3 times performance of a traditional software consensus implementation. The latency is also better for our system and is only one third of the latency of the software consensus implementation when both systems are under half of their maximum throughputs. In order to drive even higher performance, we apply a partition mechanism to our switch-based system, leading to 11 times better throughput and 5 times better latency. By dynamically switching between software-based and network-based implementations, our consensus systems not only improve performance but also use energy more efficiently. Encouraged by those benefits, we developed a fault-tolerant non-volatile memory system. A prototype using software memory controller demonstrated reasonable overhead over local memory access, showing great promise as scalable main memory. Our network-based consensus approach would have a great impact in data centers. It not only improves performance of replication mechanisms which relied on consensus, but also enhances performance of services built on top of those replication mechanisms. Our approach also motivates others to move new functionalities into the network, such as, key-value store and stream processing. We expect that in the near future, applications that typically run on traditional servers will be folded into networks for performance
    • …
    corecore