129 research outputs found
Modular Remote Reprogramming of Sensor Nodes
Wireless sensor networks are envisioned to be deployed in the absence of permanent network infrastructure and in environments with limited or no human accessibility. Hence, such deployments demand mechanisms to remotely (i.e., over the air) reconfigure and update the software on the nodes. In this paper we introduce DyTOS, a TinyOS based remote reprogramming approach that enables the dynamic exchange of software components and thus incrementally update the operating system and its applications. The core idea is to preserve the modularity of TinyOS, i.e., its componentisation, which is lost during the normal compilation process, and enable runtime composition of TinyOS components on the sensor node. The proposed solution integrates seamlessly into the system architecture of TinyOS: It does not require any changes to the programming model of TinyOS and all existing components can be reused transparently. Our evaluation shows that DyTOS incurs a low performance overhead while keeping a smaller – up to one third – memory footprint than other comparable solutions
Recommended from our members
Bespoke Security for Resource Constrained Cyber-Physical Systems
Cyber-Physical Systems (CPSs) are critical to many aspects of our daily lives. Autonomous cars, life saving medical devices, drones for package delivery, and robots for manufacturing are all prime examples of CPSs. The dual cyber/physical operating nature and highly integrated feedback control loops of CPSs means that they inherit security problems from traditional computing systems (e.g., software vulnerabilities, hardware side-channels) and physical systems (e.g., theft, tampering), while additionally introducing challenges of their own. The challenges to achieving security for CPSs stem not only from the interaction of the cyber and physical domains, but from the additional pressures of resource constraints imposed due to cost, limited energy budgets, and real-time nature of workloads. Due to the tight resource constraints of CPSs, there is often little headroom to devote for security. Thus, there is a need for low overhead deployable solutions to harden resource constrained CPSs. This dissertation shows that security can be effectively integrated into resource constrained cyber-physical system devices by leveraging fundamental physical properties, & tailoring and extending age-old abstractions in computing.
To provide context on the state of security for CPSs, this document begins with the development of a unifying framework that can be used to identify threats and opportunities for enforcing security policies while providing a systematic survey of the field. This dissertation characterizes the properties of CPSs and typical components (e.g., sensors, actuators, computing devices) in addition to the software commonly used. We discuss available security primitives and their limitations for both hardware and software. In particular, we focus on software security threats targeting memory safety. The rest of the thesis focuses on the design and implementation of novel, deployable approaches to combat memory safety on resource constrained devices used by CPSs (e.g., 32-bit processors and microcontrollers). We first discuss how cyber-physical system properties such as inertia and feedback can be used to harden software efficiently with minimal modification to both hardware and software. We develop the framework You Only Live Once (YOLO) that proactively resets a device and restores it from a secure verified snapshot. YOLO relies on inertia, to tolerate periods of resets, and on feedback to rebuild state when recovering from a snapshot. YOLO is built upon a theoretical model that is used to determine safe operating parameters to aid a system designer in deployment. We evaluate YOLO in simulation and two real-world CPSs, an engine and drone.
Second, we explore how rethinking of core computing concepts can lead to new fundamental abstractions that can efficiently hide performance overheads usually associated with hardening software against memory safety issues. To this end, we present two techniques: (i) The Phantom Address Space (PAS) is a new architectural concept that can be used to improve N-version systems by (almost) eliminating the overheads associated with handling replicated execution. Specifically, PAS can be used to provide an efficient implementation of a diversification concept known as execution path randomization aimed at thwarting code-reuse attacks. The goal of execution path randomization is to frequently switch between two distinct program variants forcing the attacker to gamble on which code to reuse. (ii) Cache Line Formats (Califorms) introduces a novel method to efficiently store memory in caches. Califorms makes the novel insight that dead spaces in program data due to its memory layout can be used to efficiently implement the concept of memory blacklisting, which prohibits a program from accessing certain memory regions based on program semantics. Califorms not onlyconsumes less memory than prior approaches, but can provide byte-granular protection while limiting the scope of its hardware changes to caches. While both PAS and Califorms were originally designed to target resource constrained devices, it's worth noting that they are widely applicable and can efficiently scale up to mobile, desktop, and server class processors.
As CPSs continue to proliferate and become integrated in more critical infrastructure, security is an increasing concern. However, security will undoubtedly always play second fiddle to financial concerns that affect business bottom lines. Thus, it is important that there be easily deployable, low-overhead solutions that can scale from the most constrained of devices to more featureful systems for future migration. This dissertation is one step towards the goal of providing inexpensive mechanisms to ensure the security of cyber-physical system software
Exploring Superpage Promotion Policies for Efficient Address Translation
Address translation performance for modern applications depends heavily upon the number of translation entries cached in the hardware TLB (translation look-aside buffer). Therefore, the efficiency of address translation relies directly on the TLB hit rate. The number of TLB entries continues to fall further behind the growth of memory consumption for modern applications. Superpages, which are pages with larger sizes, can increase the efficiency of the TLB by enabling each translation entry to cover a larger memory region. Without requiring more TLB entries, using superpages can increase the TLB hit rate and benefit address translation. However, using superpages can bring overhead. The TLB uses a single dirty bit to mark a page as dirty during address translation before modifying the page, so the granularity of the dirty bit corresponds to the coverage of the translation entry. As a result, the OS (operating system) will pay extra I/O effort when it allocates or writes an underutilized superpage back to disk. Such extra overhead can easily surpass the address translation benefits of superpages. This thesis discusses the performance trade-offs of superpages by exploring the design space of superpage promotion policies in the OS. A data collection infrastructure is built based on QEMU with kernel instrumentation on FreeBSD to collaboratively collect both memory accesses and kernel events. Then, the TLB behavior of Intel Skylake x86 family processors is simulated. The simulation has been validated to be faithful and consistent with the real-world performance. Last, this thesis evaluates and compares both TLB performance benefits and I/O overheads among the superpage promotion policies to discuss the trade-offs in the design space
Profile Guided Dataflow Transformation for FPGAs and CPUs
This paper proposes a new high-level approach for optimising field programmable gate array (FPGA) designs. FPGA designs are commonly implemented in low-level hardware description languages (HDLs), which lack the abstractions necessary for identifying opportunities for significant performance improvements. Using a computer vision case study, we show that modelling computation with dataflow abstractions enables substantial restructuring of FPGA designs before lowering to the HDL level, and also improve CPU performance. Using the CPU transformations, runtime is reduced by 43 %. Using the FPGA transformations, clock frequency is increased from 67MHz to 110MHz. Our results outperform commercial low-level HDL optimisations, showcasing dataflow program abstraction as an amenable computation model for highly effective FPGA optimisation
A Survey on Virtualization of Wireless Sensor Networks
Wireless Sensor Networks (WSNs) are gaining tremendous importance thanks to their broad range of commercial applications such as in smart home automation, health-care and industrial automation. In these applications multi-vendor and heterogeneous sensor nodes are deployed. Due to strict administrative control over the specific WSN domains, communication barriers, conflicting goals and the economic interests of different WSN sensor node vendors, it is difficult to introduce a large scale federated WSN. By allowing heterogeneous sensor nodes in WSNs to coexist on a shared physical sensor substrate, virtualization in sensor network may provide flexibility, cost effective solutions, promote diversity, ensure security and increase manageability. This paper surveys the novel approach of using the large scale federated WSN resources in a sensor virtualization environment. Our focus in this paper is to introduce a few design goals, the challenges and opportunities of research in the field of sensor network virtualization as well as to illustrate a current status of research in this field. This paper also presents a wide array of state-of-the art projects related to sensor network virtualization
Genetic improvement of GPU software
We survey genetic improvement (GI) of general purpose computing on graphics cards. We summarise several experiments which demonstrate four themes. Experiments with the gzip program show that genetic programming can automatically port sequential C code to parallel code. Experiments with the StereoCamera program show that GI can upgrade legacy parallel code for new hardware and software. Experiments with NiftyReg and BarraCUDA show that GI can make substantial improvements to current parallel CUDA applications. Finally, experiments with the pknotsRG program show that with semi-automated approaches, enormous speed ups can sometimes be had by growing and grafting new code with genetic programming in combination with human input
Defining interfaces between hardware and software: Quality and performance
One of the most important interfaces in a computer system is the interface between hardware and software. This interface is the contract between the hardware designer and the programmer that defines the functional behaviour of the hardware. This thesis examines two critical aspects of defining the hardware-software interface: quality and performance.
The first aspect is creating a high quality specification of the interface as conventionally defined in an instruction set architecture. The majority of this thesis is concerned with creating a specification that covers the full scope of the interface; that is applicable to all current implementations of the architecture; and that can be trusted to accurately describe the behaviour of implementations of the architecture. We describe the development of a formal specification of the two major types of Arm processors: A-class (for mobile devices such as phones and tablets) and M-class (for micro-controllers). These specifications are unparalleled in their scope, applicability and trustworthiness. This thesis identifies and illustrates what we consider the key ingredient in achieving this goal: creating a specification that is used by many different user groups. Supporting many different groups leads to improved quality as each group finds different problems in the specification; and, by providing value to each different group, it helps justify the considerable effort required to create a high quality specification of a major processor architecture. The work described in this thesis led to a step change in Arm's ability to use formal verification techniques to detect errors in their processors; enabled extensive testing of the specification against Arm's official architecture conformance suite; improved the quality of Arm's architecture conformance suite based on measuring the architectural coverage of the tests; supported earlier, faster development of architecture extensions by enabling animation of changes as they are being made; and enabled early detection of problems created from architecture extensions by performing formal validation of the specification against semi-structured natural language specifications. As far as we are aware, no other mainstream processor architecture has this capability. The formal specifications are included in Arm's publicly released architecture reference manuals and the A-class specification is also released in machine-readable form.
The second aspect is creating a high performance interface by defining the hardware-software interface of a software-defined radio subsystem using a programming language. That is, an interface that allows software to exploit the potential performance of the underlying hardware. While the hardware-software interface is normally defined in terms of machine code, peripheral control registers and memory maps, we define it using a programming language instead. This higher level interface provides the opportunity for compilers to hide some of the low-level differences between different systems from the programmer: a potentially very efficient way of providing a stable, portable interface without having to add hardware to provide portability between different hardware platforms. We describe the design and implementation of a set of extensions to the C programming language to support programming high performance, energy efficient, software defined radio systems. The language extensions enable the programmer to exploit the pipeline parallelism typically present in digital signal processing applications and to make efficient use of the asymmetric multiprocessor systems designed to support such applications. The extensions consist primarily of annotations that can be checked for consistency and that support annotation inference in order to reduce the number of annotations required. Reducing the number of annotations does not just save programmer effort, it also improves portability by reducing the number of annotations that need to be changed when porting an application from one platform to another. This work formed part of a project that developed a high-performance, energy-efficient, software defined radio capable of implementing the physical layers of the 4G cellphone standard (LTE), 802.11a WiFi and Digital Video Broadcast (DVB) with a power and silicon area budget that was competitive with a conventional custom ASIC solution.
The Arm architecture is the largest computer architecture by volume in the world. It behooves us to ensure that the interface it describes is appropriately defined
High performance annotation-aware JVM for Java cards
Early applications of smart cards have focused in the area of per-sonal security. Recently, there has been an increasing demand for networked, multi-application cards. In this new scenario, enhanced application-specific on-card Java applets and complex cryptographic services are executed through the smart card Java Virtual Machine (JVM). In order to support such computation-intensive applica-tions, contemporary smart cards are designed with built-in micro-processors and memory. As smart cards are highly area-constrained environments with memory, CPU and peripherals competing for a very small die space, the VM execution engine of choice is often a small, slow interpreter. In addition, support for multiple applica-tions and cryptographic services demands high performance VM execution engine. The above necessitates the optimization of the JVM for Java Cards
- …