Abstract. Memory-safety attacks have been one of the most critical threats against computing systems. Although a wide-range of defense techniques have been developed against these attacks, the existing mitigation strategies have several limitations. In particular, most of the existing mitigation approaches are based on aborting or restarting the victim program when a memory-safety attack is detected, thus making the system unavailable. This might not be acceptable in systems with stringent timing constraints, such as cyber-physical systems (CPS), since the system unavailability leaves the control system in an unsafe state. To address this problem, we propose CIMA -a resilient and light-weight mitigation technique that prevents invalid memory accesses at runtime. CIMA manipulates the compiler-generated control-flow graph to automatically detect and bypass unsafe memory accesses at runtime, thereby mitigating memorysafety attacks along the process. An appealing feature of CIMA is that it also ensures system availability and resilience of the CPS even under the presence of memory-safety attacks. To this end, we design our experimental setup based on a realistic Secure Water Treatment (SWaT) and Secure Urban Transportation System (SecUTS) testbeds and evaluate the effectiveness and the efficiency of our approach. The experimental results reveal that CIMA handles memorysafety attacks effectively with low overhead. Moreover, it meets the real-time constraints and physical-state resiliency of the CPS under test.
Introduction
Software systems with stringent real-time constraints are often written in C/C++ since they aid in generating efficient program binaries. However, since memory management is handled manually and also the lack of bounds checking in C/C++, programs written in such languages often suffer from memory-safety vulnerabilities, such as buffer over/underflows, dangling pointers, double-free errors and memory leaks. Due to the difficulty in discovering these vulnerabilities, they might often slip into production run. This leads to memory corruptions and runtime crashes even in deployed systems. In the worst case, memory-safety vulnerabilities can be exploited by a class of cyber attacks, which we refer to as memory-safety attacks [1, 2] . Memory-safety attacks, such as code injection [3, 4] and code reuse [5, 6, 7] , can cause devastating effects by compromising vulnerable programs in any computing system. These attacks can hijack or subvert specific operations of a system or take over the entire system, in general. A detailed account of such attacks is provided in existing works [7, 8, 9] .
Given a system vulnerable to memory-safety attacks, its runtime behavior will depend on the type of memory accesses being made, among others. For instance, accesses to an array element beyond the imposed length of the array (buffer overflows) could be exploited to overwrite the return address of a function. However, a safety check generated at compile time [10] can be added before any memory access to ensure validity of the memory going to be accessed. This can be accomplished via compiler-assisted program analysis. Traditionally, such memory-safe compilers will generate an exception and abort when such violations are found at run-time.
However, we make the observation that one could also react to such violations by bypassing the illegal instructions, i.e., instructions that attempt to access memory illegally, and thus favoring availability of the system. This is the main intuition upon which our CIMA approach is based upon. In particular, CIMA is a resilient mitigation strategy that effectively and efficiently prevents memory-safety attacks and guarantees system availability with minimal overhead (13.83%).
To realize the aforementioned intuition behind CIMA, we face several technical challenges. Firstly, it is infeasible (in general) to statically compute the exact set of illegal memory accesses in a program. Consequently, a fully static approach, which modifies the program to eliminate the illegal instructions from the program, is unlikely to be effective. Moreover, such an approach will inevitably face scalability bottlenecks due to its heavy reliance on sophisticated program analysis. Secondly, even if the illegal instructions are identified during execution, it is challenging to bypass the manifestation of illegal memory access. This is because, such a strategy demands full control to manipulate the normal flow of program execution. Finally, to bypass the execution of certain instructions, we need modifications to the control flow of the program. From a technical perspective, such modifications involve the manipulation of program control flow over multitudes of passes in mainstream compilers.
To alleviate the technical challenges, CIMA systematically combines compile-time instrumentation and runtime monitoring to defend against memory-safety attacks. Specifically, at compile time, each instrumented memory access is guarded via a conditional check to detect its validity at runtime. In the event where the conditional check fails, CIMA skips the respective memory access at runtime. To implement such a twisted flow of control, CIMA automatically transforms the program control flow logic within mainstream compilers. This makes CIMA a proactive mitigation strategy against a large class of memory-safety attacks.
It is the novel mitigation strategy that sets our CIMA approach apart from the existing works [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21] . Most of the existing mitigation schemes against memory-safety attacks are primarily programmed to abort or restart the victim system when an attack or a memory-safety violation is detected. Other schemes, such as self-healing [22, 23, 24] or live patching [25] , are based on directly detecting exploitations or attacks and resuming the corrupted system from a previous safe state. CIMA never aborts the system and avoids the heavy overhead of maintaining system states for checkpoints. On the contrary, CIMA follows a fundamentally different approach, i.e., to continue execution by skipping only the illegal instructions.
Although our proposed security solution is applicable to any computing system that involves C/C++ programs, we mainly focus on the CPS domain. This is because, un-like the mainstream systems, CPS often imposes conflicting design constraints involving real-time guarantees, physical-state resiliency involving its physical dynamics and security. In CPS, the memory-safety vulnerabilities might be found in the firmware (or sometimes in the control software) of PLCs. Such firmware is commonly implemented in C/C++ for the sake of efficiency. Consequently, it is not uncommon to have buffer over/underflows and dangling pointers being regularly discovered even in modern PLCs. In fact, recent trends in Common Vulnerabilities and Exposures (CVEs) show the high volume of interest in exploiting these vulnerabilities in PLCs [26, 27, 28, 29, 30, 31, 32, 33] . This shows that the mitigation of memory-safety attacks in CPS should not merely be restricted to academic research. Instead, it is a domain that requires urgent and practical security solutions to protect a variety of critical infrastructures at hand. Nonetheless, attacks that manipulate sensor and actuator values in CPS (either directly on sensor/actuator devices or on communication channels) are orthogonal issues that are out of our scope.
In summary, our work tackles the problem of ensuring critical systems and services to remain available and effective while successfully mitigating a wide-range of memorysafety attacks. We make the following technical contributions:
1. We effectively prevent system crashes that could be arisen due to memory-safety violations. 2. We effectively and efficiently prevent memory-safety attacks in any computing system. 3. We define the notion of physical-state resiliency that is crucial for CPS and should be met alongside strong security guarantees. 4. Our mitigation solution ensures physical-state resiliency and system availability with reasonable runtime and storage overheads. Thus, it is practically applicable to systems with stringent timing constraints, such as CPS. 5. We evaluate the effectiveness and efficiency of our approach on two real-world CPS testbeds.
Background
In this section, we introduce the necessary background in the context of our CIMA approach. containing sensor inputs, control commands, diagnostic information and alarms transmit from one CPS entity to another. -SCADA: A software entity designed for monitoring and controlling different processes in a CPS. It often comprises a human-machine interface (HMI) and a historian server. The HMI is used to display state information of plants and physical processes in the CPS. The historian server is used to store all operational data and the history of alarms.
CPS
An abstraction of a typical CPS architecture is shown in Figure 2 . In Figure 2 , x denotes the physical state of the plant, y captures the sensor measurements and u denotes the control command computed by the PLC at any given point of time.
ASan
Despite the presence of several memory error detector tools, their applicability in CPS is limited due to several reasons. This includes the lack of error coverage, significant performance overhead and other technical compatibility issues. After researching and experimenting on various memory-safety tools, we chose ASan [10] as our memory error detector tool. Our choice is motivated by its broad error coverage, high detection accuracy and relatively low runtime overhead when compared with other codeinstrumentation based tools [10, 34] .
ASan is a compile-time memory-safety tool based on code instrumentation. It instruments C/C++ programs at compile time. The instrumented program will then contain additional ASan libraries, which are checked to detect possible memory-safety violation at runtime. Such an instrumented code can detect buffer over/underflows, useafter-free errors (dangling pointers), use-after-return errors, initialization order bugs and memory leaks.
Since ASan was primarily designed for x86 architectures, it has compatibility issues with RISC-based ARM or AVR based architectures. Therefore, we adapted ASan for ARM-based architecture in our system. Limitations of ASan: Although ASan covers a wide range of memory errors, it does not cover some memory-safety errors such as uninitialized memory reads and some use-after-return bugs. In fact, such errors are less critical and rarely exploited in practice. ASan also has minor limitation in detection accuracy. Although it offers high detection accuracy for most memory-safety vulnerabilities, there are rare false negatives for global buffer overflow and use-after-free vulnerabilities. This might allow memory-safety attacks to bypass the checks enforced by ASan with low probability. The other major limitation of ASan is its ineffective mitigation strategy; it simply aborts the system whenever a memory-safety violation or an attack is detected. This makes ASan inapplicable in systems with stringent availability constraints.
ASan as a debugging and monitoring tool: Because of the limitations, as mentioned in the preceding paragraph, ASan is often considered as rather a debugging tool than a runtime monitoring tool. However, using ASan only as a debugging tool does not guarantee memory-safety. Because, most memory-safety vulnerabilities (e.g. buffer overflows) can be probed by systematically crafted inputs by attackers who aim to exploit it. Since carefully tailored inputs might not be used during debugging, ASan might miss important memory bugs that can be exploited by an attacker at runtime. Therefore, only debugging the program does not offer sufficient guarantee in detecting critical memory-safety vulnerabilities.
In our CIMA approach, we adopt ASan for the dual purpose of debugging and runtime monitoring, with the specific focus on mitigating memory-safety vulnerabilities. As a runtime monitoring tool, CIMA leverages on ASan to detect attacker injected memory-safety bugs. Moreover, CIMA enhances the capability of ASan to mitigate memory-safety bugs on-the-fly. This is to ensure the availability of the underlying system and in stark contrast to plain system abort.
Attacker and system models
In this section, we first discuss the attacker model. Then, we discuss the different traits of formally modeling our system and the related design constraints.
Attacker model
The main objective of memory-safety attacks, like code injection and code reuse, is to get privileged access or take control (otherwise to hijack/subvert specific operations) of the vulnerable system. To achieve this, vulnerabilities such as buffer overflows and dangling pointers of a program are targeted. For example, to exploit a buffer overflow, the attacker sends carefully crafted input to the buffer. When the buffer overflows, the attacker can manipulate important memory addresses, like return address of a function, and divert control flow of the program. A detailed account of such exploitations can be found in [7, 35] . In general, a typical memory-safety attack follows the following steps (See Figure 1 ):
1. Interacting with the victim PLC, e.g., via network connection (for remote attacks). 2. Finding a memory-safety vulnerability (e.g., buffer overflow, dangling pointers) in the PLC firmware or control software with the objective of exploiting it. 3. Triggering a memory-safety violation on the PLC, e.g., overflowing a buffer. 4. Overwriting critical addresses of the vulnerable program, e.g., overwriting return address of the PLC program. 5. Using the modified address, divert control flow of the program to an injected (malicious) code (i.e. code injection attacks) or to an existing module of the vulnerable program (i.e. code reuse attacks). In the former case, the attacker can take over the PLC with the injected malicious code. In the latter case, the attacker still needs to collect appropriate gadgets from the program (basically by scanning the program's text segment), then she will synthesize a shellcode that will allow her to take over the PLC. 
Modeling CPS timing constraints
Most cyber-physical systems are highly time-critical. The communication between its different components, such as sensors, controllers (PLCs) and actuators, is synchronized by system time. Therefore, delay in these CPS nodes could result in disruption of the control system or damage the physical plant. In particular, PLCs form the main control devices and computing nodes of a typical CPS. As such, PLCs often impose hard real-time constraints to maintain the stability of the control system in a CPS. In the following section, we define and discuss the notion of real-time constraints imposed on a typical CPS.
Modeling real-time constraints In general, PLCs undergo a cyclic process called scan cycle. This involves three major operations: input scan, PLC program (logic) execution and output update. The time it takes to complete these operations is called scan time (T s ). A typical CPS defines and sets an upper-bound on time taken by PLC scan cycle, called cycle time (T c ). This means the scan cycle must be completed within the duration of the cycle time specified, i.e., T s ≤ T c . We refer this constraint as a realtime constraint of the PLC. By design, PLCs meet this constraint. However, due to additional overheads, such as memory-safety overheads (MSO), PLCs might not satisfy its real-time constraint. As discussed above, by hardening the PLC with our memorysafety protection (detection + mitigation), the scan time increases. This increase in the scan time is attributed to the MSO. Concretely, the memory-safety overhead is computed as follows:
whereT s and T s are scan time with and without memory-safe compilation, respectively. A detailed account of modeling T s is provided in our earlier works [8, 9] . It is crucial to check whether the induced MSO by the memory-safe compilation still satisfies the real-time constraint imposed by the PLC. To this end, we compute MSO for -1) average-case and 2) worst-case scenarios. In particular, after enabling memorysafe compilation, we compute the scan time (i.e.T s ) for n different measurements and compute the respective average and worst-case scan time. Formally, we say that the MSO is acceptable in average-case if the following condition is satisfied:
In a similar fashion, MSO is acceptable in the worst-case with the following condition:
whereT s (i) captures the scan time for the i-th measurement after the memory-safe compilation.
Physical-state resiliency
The stability of CPS controllers (i.e. PLCs in our case) plays a crucial role in enforcing the dynamics of a cyber-physical system to be compliant with its requirement. For example, assume that a PLC issues control commands every T c cycle and an actuator receives these commands at the same rate. Therefore, cycle time of the PLC is T c . If the PLC is down for an arbitrary amount of time say τ , then the actuator will not receive fresh control commands for the duration τ . Consequently, the physical dynamics of the respective CPS will be affected for a total of τ Tc scan cycles. We note that the duration τ might be arbitrarily large. Existing solutions [10] , albeit in a non-CPS environment, typically revert to aborting the underlying process or restart the entire system when a memory-safety attack is detected. Due to the critical importance of availability constraints in CPS, our CIMA approach never aborts the underlying system. Nevertheless, CIMA induces an overhead to the scan time of the PLC, as discussed in the preceding section. Consequently, the scan time of PLCs, with CIMA-enabled memory-safety (i.e.T s ), may increase beyond the cycle time (i.e. T c ). In general, to accurately formulate τ (i.e. the amount of downtime for a PLC), we need to consider the following three mutually exclusive scenarios:
1. The system is aborted or restarted. 2. The system is neither aborted nor restarted andT s ≤ T c . In this case, there will be no observable impact on the physical dynamics of the system. This is because the PLCs, despite having increased scan time, still meet the real-time constraint T c . Thus, they are not susceptible to downtime. 3. The system is neither aborted nor restarted andT s > T c . In this scenario, the PLCs will have a downtime ofT s − T c , as the scan time violates the real-time constraint T c .
Based on the intuitions mentioned in the preceding paragraphs, we now formally define τ , i.e., the downtime of a PLC as follows:
where ∆ captures a non-deterministic threshold on the downtime of PLCs when the underlying system is aborted or restarted.
Example As an example, let us consider the first process in SWaT (discussed in Section 6.1). This process controls the inflow of water from an external water supply to a raw water tank. PLC1 controls this process by opening (with "ON" command) and closing (with "OFF" command) a motorized valve, i.e., the actuator, connected with the inlet pipe to the tank. If the valve is "ON" for an arbitrarily long duration, then the raw water tank might overflow, often causing a severe damage to the system. This might occur due to the PLC1 downtime τ , during which, the control command (i.e. "ON") computed by PLC1 may not change. Similarly, if the actuator receives the command "OFF" from PLC1 for an arbitrarily long duration, then the water tank may underflow. Such a phenomenon will still affect the system dynamics. This is because tanks from other processes may expect raw water from this underflow tank. In the context of SWaT, the tolerability of PLC1 downtime τ (cf. Eq. (4)) depends on the physical state of the water tank (i.e. water level) and the computed control commands (i.e. ON or OFF) by PLC1. In the next section, we will formally introduce this notion of tolerance, as termed physical-state resiliency, for a typical CPS.
Modeling physical-state resiliency To formally model the physical-state resiliency, we will take a control-theoretic approach. For the sake of simplifying the presentation, we will assume that the dynamics of a typical CPS, without considering the noise and disturbance on the controller, is modeled via a linear-time invariant. This is formally captured as follows (cf. Figure 2 ):
where t ∈ N captures the index of discretized time domain. x t ∈ R k is the state vector of the physical plant at time t, u t ∈ R m is the control command vector at time t and y t ∈ R k is the measured output vector from sensors at time t. A ∈ R k×k is the state matrix, B ∈ R k×m is the control matrix and C ∈ R k×k is the output matrix. We now consider a duration τ ∈ R for the PLC downtime. To check the tolerance of τ , we need to validate the physical state vector x t at any discretized time index t. To this end, we first assume an upper-bound ω ∈ R k on the physical state vector x t at an arbitrary time t. Therefore, for satisfying physical state resiliency, x t must not exceed the threshold ω. In a similar fashion, we define a lower-bound θ ∈ R k on the physical state vector x t .
With the PLC downtime τ , we revisit Eq. (5) and the state estimation is refined as follows: where x t+1 ∈ R k is the estimated state vector at time t+1 and the PLCs were down for a maximum duration τ . The notation
captures that the control command u t−1 was active for a time interval [t, t + τ ] due to the PLC downtime. In Eq. (7), we assume, without loss of generality, that u t−1 is the last control command received from the PLC before its downtime.
Given the foundation introduced in the preceding paragraphs, we say that a typical CPS (cf. Figure 2 ) satisfies physical-state resiliency if and only if the following condition holds at an arbitrary time index t:
t-1 t t+1 . If the downtime τ 1 = 0, then u t (i.e. control command at time t) is correctly computed and x t+1 = x t+1 . If the downtime τ 2 ∈ (1, 2], then the control command u t will be the same as u t−1 . Consequently, x t+1 is unlikely to be equal to x t+1 . Finally, when downtime τ 3 > 2, the control command vector u t+i for i ≥ 0 will be the same as u t−1 . As a result, the estimated state vectors x t+j for j ≥ 1 will unlikely to be identical to x t+j .
CIMA: Countering Illegal Memory Accesses
CIMA is a mitigation technique designed to counter illegal memory accesses at runtime. We first outline a high-level overview and the key ingredients of our approach. Later in this section, we discuss the specific implementation traits in more detail.
Overview of CIMA
Objective CIMA follows a proactive approach to mitigate memory-safety attacks and thereby ensuring system availability and physical-state resilience in CPS. The key insight behind CIMA is to prevent any operation that attempts to illegally access memory. We accomplish this by skipping (i.e. not executing) the illegal instructions that attempt to access memory illegally. For example, consider the exploitation of a memory-safety attack shown in Figure 1 . With our CIMA approach, the attack will be ceased at step 3 (i.e. "Trigger a memory-safety violation") of the exploitation process.
Workflow of CIMA CIMA systematically manipulates the compiler-generated control flow graph (CFG) to accomplish its objective, i.e., to skip illegal memory accesses at runtime. Control flow graph (CFG) of a program is represented by a set of basic blocks having incoming and outgoing edges. A basic block [36] is a straight-line sequence of code with only one entry point and only one exit, but it can have multiple predecessors and successors. CIMA works on a common intermediate representation of the underlying code, thus making our approach applicable for multiple high-level programming languages and low-level target architectures.
To manipulate the CFG, CIMA needs to instrument the set of memory-access operations in such a fashion that they trigger memory-safety violations at runtime. To this end, CIMA leverages off-the-shelf technologies, namely address sanitizers (ASan). A high-level workflow of our entire approach appears in Figure 4 . The implementation of CIMA replaces the ineffective mitigation strategy (i.e. system abort) of ASan with an effective, yet lightweight scheme. Such a scheme allows the program control flow to jump to the immediately next instruction that does not exhibit illegal memory access. In such a fashion, CIMA not only ensures high accuracy of memory-safety attack mitigation, but also provides confidence in system availability and resilience. In the following section, we describe our CIMA approach in detail.
Approach of CIMA
An outline of our CIMA approach appears in Algorithm 1. As input, CIMA takes a function (in the GIMPLE format) instrumented by ASan. As output, CIMA generates a GIMPLE function with memory-safety mitigation enabled. We note that the modification is directly injected in the compiler workflow. To this end, CIMA modifies the internal data structures of GCC to reflect the changes of control flow. Specifically, To this end, CIMA also ensures that the SSA form is maintained during the manipulation, so as not to disrupt the compiler workflow. We now elaborate the crucial steps of CIMA in detail and discuss the specific choices taken during its implementation.
Capture memory access instructions: CIMA validates memory accesses at runtime to detect and mitigate illegal memory access instructions. To achieve this, CIMA combines a compile-time code instrumentation and a runtime validity check technique. The code instrumentation includes instrumenting memory addresses (via ASan) and memory access instructions (via CIMA). The memory address instrumentation creates poisoned (i.e. illegal) memory regions, known as redzones, around stack, heap and global variables. Since these redzones are inaccessible by the running program, any memory instruction, attempting to access them at runtime will be detected as an illegal instruction.
To mitigate the potential illegal instructions, CIMA instruments memory access instructions at compile-time. To this end, CIMA captures memory access instructions from the ASan instrumented code. These instructions serve as the potential candidates for illegal instructions at runtime.
Compute target instruction:
To prevent the execution of an illegal memory access instruction i, CIMA computes its corresponding target instruction T i . We note that the set of potentially illegal instructions are computed via the technique as explained in the preceding paragraph. The target instruction T i for an illegal instruction i is the instruction that will be executed right after i is bypassed. The compile-time analysis of CIMA generates an instrumented binary in such a fashion that if an illegal instruction i is detected at runtime, then i is bypassed and control reaches target instruction T i . If T i is detected as illegal too, then the successor of T i will be executed. This process continues until an instruction is found without illegal memory access at runtime.
Computing the target instruction is a critical step in our CIMA approach. The target instruction T i is computed as the successor of the memory access instruction i in the CFG. We note that a potentially illegal instruction must access memory, as the objective of CIMA is to mitigate memory-safety attacks. At the GIMPLE IR-level, any memory access instruction has a single successor. Such a successor can either be the next instruction in the same basic block (see Figure 5(a) ) or the first instruction of the successor basic block (see Figure 5(c) ). Since CIMA works at the GIMPLE IR-level, it can identify the target instruction T i for any instruction i by walking through the static control flow graph.
We modify the control flow graph in such a fashion that the execution is diverted to T i (i.e. the successor of i) at runtime when instruction i is detected to be illegal at runtime. However, the modification of the CFG depends on the location of the illegal and target instructions as discussed below. Compute target basic block: From the discussion in the preceding paragraph, we note that the execution of memory access instruction i is conditional (depending on whether it is detected as illegal at runtime). In the case where the instruction exhibits an illegal memory access, a jump to the respective target instruction T i is carried out. However, it is not possible to simply divert the execution flow to the target instruction T i . This needs to be accomplished via a systematic modification of the original program CFG at compile time.
Consider the case when the illegal instruction i and the target instruction T i reside in the same basic block (say bb). In this case, since the execution of i is conditional, it is always the first instruction of its holding basic block whereas the target instruction T i appears as the next instruction in the basic block bb (see Figure 5(a) ). However, it is not possible to make a conditional jump to T i within the same basic block bb, as this breaks the structure of the control flow graph. Thus, to be able to make a conditional jump to T i , a target basic block (say T bb ), that contains T i as its head instruction, is created. In particular, we split the basic block bb into two basic blocks -i bb and T bb (cf. Algorithm 1, Lines 9 -11). i bb holds the potentially illegal instruction i and T bb holds the instructions of bb following i, i.e., starting from the target instruction T i and up to the last instruction of bb. As such, control jumps to T bb if instruction i is detected as illegal at runtime.
If the illegal instruction i and the respective target instruction T i are located in different basic blocks in the original CFG (see Figure 5(c) ), then there is no need to split any basic block. In such a case, the target basic block T bb will be the basic block holding T i as its first instruction. Therefore, CIMA diverts the control flow to T bb , should i exhibits an illegal memory access at runtime. Modify and maintain CFG: CIMA ensures the diversion of control flow of the program whenever an illegal memory access is detected. To accomplish such a twisted control flow, CIMA directly modifies and maintains the CFG (cf. Algorithm 1, Lines (14) (15) (16) and (20)), as explained in the preceding paragraph. Figure 5 illustrates excerpts of control flow graphs that are relevant to the modification performed by CIMA. In particular, Figure 5 demonstrates how the control flow is systematically manipulated to guarantee the system availability, while still mitigating memory-safety attacks.
Illustrative Example
CIMA detected and mitigated two global buffer overflow vulnerabilities on the firmware of OpenPLC controller. Due to space limitation, we discuss here only one such vulnerability. A code fragment relevant to the vulnerability is shown in Program 2. In Line 3, a However, due to a coding error, this operation exhibits a memory-safety violation. Such a violation occurs in the 1024 th iteration when the operation attempts to write data beyond the buffer limit. CIMA successfully mitigates such memory-safety violation. This was possible as the illegal memory access operation was bypassed in each iteration starting from the 1024 th iteration of the loop.
Validating CIMA
In this section, we discuss the impact on using CIMA to mitigate memory-safety attacks.
Breaking program semantics The mitigation strategy of CIMA by itself is based upon breaking semantics of the program when an invalid memory access is detected. This is accomplished via skipping the respective memory access. In the presence of memory-safety attacks, we believe it is extremely time and memory consuming (given the CPS context) to roll back to a semantics preserving state. Thus, CIMA aims for a lightweight solution (attributes to only 13% overhead in real-world CPS) that works for most common cases and we validate CIMA with a real-world CPS. of y can be recursively defined as follows:
where y 0 is the initial value of y. At a broader level, CIMA may encounter memory safety attacks either due to an inherent vulnerability in the program or due to an illegal memory access caused by the attacker (e.g. via well crafted attack inputs). For inherent program vulnerabilities, we hypothesize that the underlying program semantics was already flawed and therefore, skipping the illegal memory accesses (due to such vulnerabilities), despite affecting the program semantics, will not further impair the system functionality. For illegal memory accesses via carefully crafted attack inputs, we hypothesize that in most common cases the program functionality will be unaffected when CIMA skips such illegal memory accesses. This is due to the reason that such illegal memory accesses were executed for the sole purpose of launching memory-safety attacks and skipping them should not affect the functionality of the system. The examples discussed in the preceding paragraphs may not happen in practice. This is because providing loop bound and loop counter values via attacker controlled memory accesses (e.g. arr [x] in Program 3) is certainly considered as a bad coding practice. The occurrence of such corner cases could result in an undesirable outcome (e.g. program crash) even in the absence of CIMA. In essence, avoiding such bad coding practices will also help CIMA to maintain continuity of program execution in the presence of memory-safety violations or attacks.
Continuity of program execution
Affecting the system dynamics CIMA may affect the overall system dynamics (or the physical-state resiliency in the context of CPS) when it skips too many illegal memory access instructions. This is because skipping too many illegal instructions creates a delay in program execution. This, in turn, may result in the following scenarios:
-Lack of fresh input: Skipping instructions results impeding of fresh input to the system. For example, actuators in CPS may not receive a new control command while instructions are skipped due to memory-safety violation. This, in turn, may affect the system dynamics. -Denial-of-service (DoS) attack: When CIMA skips too many illegal instructions, the program execution may get stuck for a long period. This may result in the denial-of-service.
As an example, consider the case where CIMA detected the OpenPLC vulnerability (cf. Program 2, Line 27). CIMA skipped the illegal memory write instruction (i.e. int memory[i] = &mb holding regs[i]) 1024 times in the for loop. This creates a delay equivalent to 1024 iterations of the loop. Let us assume δ captures the total elapsed time to skip illegal instructions in the loop. During the period of δ, the PLC is busy skipping invalid memory accesses and does not issue a control command to the actuator. The number of PLC scan cycles (similarly the number of control commands) missed during this delay can be computed as δ/T c . As such, the effect of δ in the CPS dynamics (or the physical-state resiliency) is analogous to the downtime (i.e. τ ) of the PLC discussed on Section 3.3. Therefore, quantifying the level of delay (δ) that is tolerable to satisfy physical-state resiliency of a typical CPS can be similarly modeled using Eq. (8) as follows:
To minimize the effect of CIMA on system dynamics (i.e. to satisfy Eq. (10)), we propose the following two approaches:
-Detect long skips with debugging: As discussed in Section 2.2, we make use of CIMA both as a debugging and a runtime memory-safety tool. If the illegal memory access occurs due to an existing vulnerability in a loop, i.e., the vulnerability is from the existing source-code of the program and not attacker injected, then CIMA automatically detects this vulnerability while debugging the program. This is experimentally validated by accurately detecting the vulnerabilities found on the OpenPLC firmware. We also developed an informative report (supported by an alarm) for detected and mitigated illegal memory access instructions. Therefore, the inherent vulnerabilities in the source-code should be manually fixed once they are discovered by our framework. On the other hand, as discussed on Section 5, attackers may manipulate memory accesses in the loop bound or loop counter via untrusted inputs. However, as discussed in the preceding sections, such vulnerabilities are caused by bad coding practices and they can be solved by avoiding bad coding practices. For illegal memory accesses that occur outside loops (e.g. a substantial number of illegal memory access instructions in the source-code), the skipping time is unlikely to affect the system dynamics. As evidenced by our experiments, the skipping time of a single instruction is negligible. As such, skipping thousands of instructions is still tolerable in the context of real-world CPS. Yet, it is unlikely to have tens of thousands of illegal memory accesses in the absence of loops. -Exiting loop: In certain cases, it might be possible to exit the loop when a sufficient number of illegal memory accesses are detected inside it. This will reduce the delay δ. However, more involved analysis are required to identify potential loops that can be skipped altogether as soon as a certain number of illegal memory accesses are detected within it. We plan to extend CIMA along this direction in the future.
case Studies and Experimental Design

SWaT
SWaT [37] is a fully operational water purification testbed for research in the design of secure cyber-physical systems. It produces five gallons/minute of doubly filtered water. A detailed account of the water purification process and some salient features and design considerations of SWaT can be found on our prior work [8] . Concurrently, SWaT is based on closed-source and proprietary Allen Bradely PLCs. Hence, it is not possible to directly modify the firmware of these PLCs and to enforce memory-safety solutions. To alleviate this problem in our experimental evaluation, an open platform, named Open-SWaT, was designed. Open-SWaT [8, 9] is a mini CPS based on open source PLCs that mimics the features and operational behaviors of SWaT. The PLCs are designed using OpenPLC [38] -an open source PLC that runs on top of Linux on Raspberry PI (RPI). With Open-SWaT, we reproduce operational profiles and details of SWaT. In particular, we reproduce the main factors that have substantial effect on the scan time and MSO of PLCs such as hardware specifications of PLCs (e.g. 200MHz of CPU speed and 2MB of user memory), a Remote Input/Output (RIO) terminal (containing 32 digital inputs (DI), 13 analog inputs (AI) and 16 digital outputs (DO)), real-time constraints (i.e. cycle time of PLCs), a PLC program (containing 129 instructions of several types), communication frequencies and a full SCADA system. The main purpose of designing Open-SWaT was to employ our CIMA approach on a realistic CPS. A high-level architecture of OpenSWaT is shown in Figure 6 . Due to space limitation, interested readers are referred to a detailed account of Open-SWaT in our prior papers [8, 9] . 
SecUTS
The Secure Urban Transportation System (SecUTS) is a CPS testbed designed to research on the security of a Metro SCADA system. The Metro SCADA system [39] comprises an Integrated Supervisory Control System (ISCS) and a train signaling system. ISCS integrates localized and centralized control and supervision of mechanical and electrical subsystems located at remote tunnels, depots, power substations and passenger stations. The entire Metro system can be remotely communicated, monitored, and controlled from the operation control center via the communication network. On the other hand, the signaling system facilitates communications between train-borne and track-side controllers. It also controls track-side equipments and train position localization. Modbus is used as a communication protocol among the devices in the ISCS. A detailed account of the Metro SCADA can be found on [39] .
The SecUTS testbed provides facilities to examine several types of cyber attacks, such as message replay, forged message and memory-safety attacks, in the ISCS system and enforce proper countermeasures against such attacks. However, the SecUTS testbed is also based on closed-source proprietary Siemens PLCs, hence we cannot directly enforce CIMA to these PLCs to detect and mitigate memory-safety attacks. Consequently, we similarly designed Open-SecUTS testbed (by mimicking SecUTS) using OpenPLC controller. It consists of 6 DI (emergency and control buttons) and 9 DO (tunnel and station lightings, ventilation and alarms). Subsequently, we enforced CIMA to Open-SecUTS and evaluated its practical applicability in a Metro SCADA system (See Section 7).
Measuring runtime overheads
To measure the scan time and and compute memory-safety overheads of the PLCs in Open-SWaT and Open-SecUTS, a function is implemented using POSIX clocks (in nanosecond resolution). The function measures the execution time of each operation in the PLC scan cycle. Results will be then exported to external files for further manipulation, e.g., computing MSO and plotting graphs. We run 50,000 scan cycles for each PLC operation to measure the overall performance of the PLC.
Evaluation
This section discusses a detailed evaluation of our CIMA approach on Open-SWaT and Open-SecUTS. Subsequently, we discuss the experimental results to figure out whether our proposed approach is accurate enough to detect and mitigate memorysafety violations. We also discuss the efficiency of our approach in the context of CPS environment. In brief, we evaluate the proposed approach along four dimensions: 1) Security guarantees -detection and mitigation accuracy of ASan and CIMA, respectively, 2) Performance -tolerability of the runtime overhead of the proposed security measure in CPS environment, 3) Resilience -its capability to ensure system availability and maintain physical-state resiliency in CPS even in the presence of memory-safety attacks, and 4) its Memory usage overheads.
Security guarantees
To stress test the accuracy of our approach, we have evaluated CIMA against a widerange of memory-safety vulnerabilities. This is to explore the accuracy of mitigating memory-safety vulnerabilities by our CIMA approach. As our CIMA approach is built on top of ASan, it is crucial that ASan detects a wide-range of memory-safety vulnerabilities accurately. According to the original results published for ASan [10] , it detects memory-safety violations with high-accuracy -without false positives for vulnerabilities such as (stack, heap and global) buffer under/overflows, use-after-free errors (dangling pointers), use-after-return errors, initialization order bugs and memory leaks . Only rare false negatives may appear for global buffer overflow and use-after-free vulnerabilities due to some exceptions [10] .
CIMA effectively mitigates memory-safety violations, given that such a violation is detected by ASan at runtime. Therefore, the mitigation accuracy of our CIMA approach is exactly the same as the detection accuracy of ASan.
As discussed in detail on Section 4.2, we discovered two global buffer overflow vulnerabilities in the OpenPLC firmware. Both these vulnerabilities were successfully mitigated by our CIMA approach. Besides, throughout our evaluation, we did not discover any false positives or negatives in mitigating all the memory-safety violations detected by ASan. 
Performance
According to the original article published for ASan [10] , the average MSO of ASan is 73%. However, all measurements were taken on benchmarks different from ours and more importantly, in a non-CPS environment. With our CPS environment integrated in the Open-SWaT and Open-SecUTS, the average overhead induced by ASan is 53.19% and 48%, respectively. Additionally, our proposed CIMA approach induces 13.83% and 9.92% runtime overheads on Open-SWaT and Open-SecUTS, respectively. Thus, the overall runtime overhead of our security measure is 67.02% (for Open-SWaT) and 62.73% (for Open-SecUTS). A more detailed performance report, including the performance overhead of each PLC operation in both testbeds, is illustrated on Table 1 and 2.
It is crucial to check whether the induced overhead by ASan and CIMA (T s ) is tolerable in a CPS environment. To this end, we evaluate if this overhead respects the real-time constraints of SWaT and SecUTS. For instance, consider the tolerability in average-case scenario. We observe that our proposed approach satisfies the condition of tolerability, as defined in Eq. (2). In particular, from Table 1 , mean(T s ) = 282.15µs, and T c = 10000µs; and from Table 2 , mean(T s ) = 256.67µs, and T c = 150000µs. Consequently, Eq. (2) is satisfied and the overhead induced by our CIMA approach is both tolerable in SWaT and SecUTS.
Similarly, considering the worst-case scenario, we evaluate if Eq. (3) is satisfied. From Table 1 , max(T s ) = 2629.27µs, and T c = 10000µs; and from Table 2 , max(T s ) = 3425.03µs, and T c = 150000µs. It is still tolerable, thus the proposed security measure satisfies SWaT's and SecUTS's real-time constraints in both scenarios. Therefore, despite high security guarantees provided by CIMA, its overhead is still tolerable in a CPS environment.
CIMA and ASan together induce a runtime overhead of 67.02% in Open-SWaT and 62.73% in Open-SecUTS, whereas the overhead due to CIMA is only 13.83% and 9.92%, respectively. Despite this overhead, our proposed approach meets the hard real-time constraints for SWaT and SecUTS both in the average-and worst-case scenarios. 
Resilience
One of the main contributions of our work is to empirically show the resilience of our CIMA approach against memory-safety attacks. Here, we evaluate how our mitigation strategy ensures availability and physical-state resiliency of a real-world CPS. As discussed in the preceding sections, CIMA does not render system unavailability. This is because it does not abort or restart the PLC when mitigating memory-safety attacks.
In such a fashion, the availability of PLCs is ensured by our approach. As discussed in Section 3.3, physical-state resiliency of a CPS can be affected by the memory-safety overhead (when the overhead is not tolerable due to the real-time constraint of the PLC) or the downtime of the PLC (when the PLC is unavailable for some reason).
In the preceding section, we show that our CIMA approach ensures the memorysafety overhead to be tolerable. Hence, the additional overhead induced by CIMA does not affect the physical-state resiliency. Added to the fact is that the availability of SWaT and SecUTS is also ensured by CIMA via its very construction, as CIMA never aborts the system or leads to PLC downtime. Given that our proposed solution ensures availability of the PLC and also maintains the physical-state resiliency, we ensure the resilience of SWaT even in the presence of memory-safety attacks.
CIMA ensures physical-state resiliency of SWaT and SecUTS, asT s ≤ T c (cf. Eq. (4) and Eq. (8)) holds in all of our experiments. This makes CIMA to be a security solution that ensures the SWaT and SecUTS systems to remain resilient even in the presence of memory-safety attacks.
Memory usage overheads
Finally, we evaluated the memory usage overheads of our CIMA approach. Tables 3  and 4 summarize the increased virtual memory usage, real memory usage, binary size and shared library usage for the Open-SWaT and Open-SecUTS testbeds, respectively. The reported statistics are collected by reading VmPeak, VmRSS, VmExe and VmLib fields, respectively, from /proc/self/status. In general, we observe a significant increase in virtual memory usages (8.85× for Open-SWaT and 8.70× for Open-SecUTS). This is primarily because of the allocation of large redzones with malloc (as part of the ASan approach). However, the real memory usage overhead is only 1.37× (for Open-SWaT) and 1.17× for Open-SecUTS. We believe these overheads are still acceptable since most PLCs nowadays come with at least 1GB memory size. Moreover, the increased memory size is an acceptable trade-off in the light of strong mitigation mechanics provided by our CIMA approach. Finally, we observe that CIMA introduces negligible memory usage overhead over ASan, meaning the majority of memory-usage overhead is attributed to the usage of ASan.
Related work
CIMA is built on top of ASan. ASan [10] is a fast memory-safety tool based on codeinstrumentation. It covers a wide range of temporal memory errors, such as use-afterfree, use-after-return and memory leaks, and spatial memory errors such as stack, heap and global buffer overflows. It is also a standard tool included in the GCC and LLVM compiler infrastructures as a memory-safety checker. However, its mitigation approach (in normal mode) is to simply abort the program whenever a memory-safety violation is detected. This is not acceptable in most critical systems, such as CPS and ICS, with stringent time constraints. In such systems, availability is of the utmost importance. This makes ASan to be impractical for critical systems with hard real-time constraints. ASan does also provide a special mode to continue execution even after detecting a memory-safety error. However, this mode does not provide any protection against memory-safety attacks.
Softbound [16] and its extension CETS [17] guarantee a complete memory-safety. However, such guarantees arrive with the cost of a very high runtime overhead (116%). Such a high performance overhead is unlikely to be tolerable due to the real-time constraints imposed on a typical CPS. Moreover, Softbound and CETS do not implement a mitigation strategy to consider the physical-state resiliency in a CPS environment.
SafeDispatch [11] is a fast memory-safety tool developed within the LLVM infrastructure. SafeDispatch also involves exhaustive performance optimizations to make the overhead just 2.1%. However, SafeDispatch is not supported by an appropriate mitigation strategy that guarantees system availability in the presence of memory-safety attacks. Thus, its applicability to the CPS environment is limited.
Sting [22, 23] is an end-to-end self-healing architecture developed against memorysafety attacks. It detects attacks with address space layout randomization (ASLR) and system-call-based anomaly detection techniques. However, ASLR can be defeated by code-reuse attacks and system-call-based detections (e.g. control-flow integrity) can be bypassed by data injection attacks. To diagnosis the root cause of the attack, Sting leverages a heavy-weight static taint analysis. Furthermore, it performs periodic checkpointing and continuously records system calls to resume the victim program from an earlier safe state. These techniques bring significant performance and memory usage overhead. As reported [22] , there are cases where the corrupted program cannot be recovered and requires to be restarted. Therefore, the performance and memory usage overheads and the system unavailability problems limit the applicability of Sting to a CPS environment.
ROPocop [12] is a dynamic binary code instrumentation framework against code injection and code reuse attacks. It relies on Windows x86 binaries to detect such attacks. Its runtime overhead is 240%, which is significantly high and is unlikely be tolerable due to the real-time constraints in CPS. Moreover, it is unclear what kind of mitigation strategies were incorporated with ROPocop.
Over the past decades, a number of control-flow integrity (CFI) based solutions (e.g., [13, 14, 15, 40] ) have been developed to defend against memory-safety attacks. The main objective behind these solutions is ensuring the control-flow integrity of a program. Therefore, they aim to prevent attacks from redirecting the flow of program execution. These solutions also offer a slight performance advantage over other countermeasures, such as code-instrumentation based countermeasures. However, CFI-based solutions generally have the following limitations: (i) determining the required controlflow graph (often using static analysis) is difficult and requires a substantial amount of memory; (ii) attacks that are not diverting the control-flow of the program cannot be detected (e.g. data oriented attacks [41]); (iii) finally, these solutions are mainly to detect memory-safety attacks, but do not implement mitigation strategies against the attacks. Consequently, the applicability of CFI-based solutions is limited in a CPS environment.
In summary, to the best of our knowledge, there is no prior work that efficiently detects and mitigates memory-safety attacks without compromising the real-time constraints or availability of the underlying system. Moreover, we are not aware of any work that designs and evaluates memory-safety attack mitigation techniques in the light of real-time constraints and physical-state resiliency imposed on critical systems. In this paper, CIMA bridges this gap of research.
Threats to validity
The effectiveness of CIMA critically depends on the following: timing-related constraints. Hence, we can provide high confidence on our results obtained from Open-SWaT and Open-SecUTS to be applicable also to the real SWaT and SecUTS systems.
Conclusion
In this paper, we propose CIMA, a resilient mitigation strategy to ensure protection against a wide variety of memory-safety attacks. The main advantage of CIMA is that it mitigates memory-safety attacks in a time-critical environment and ensures the system availability by skipping only the instructions that exhibit illegal memory accesses. Such an advantage makes CIMA to be an attractive choice of security measure for cyber-physical systems and critical infrastructures, which often impose strict timing constraints. To this end, we evaluate our approach on a real-world CPS testbed. Our evaluation reveals that CIMA mitigates memory-safety errors with acceptable runtime and memory-usage overheads. Moreover, it ensures that the resiliency of the physical states are maintained despite the presence of memory-safety attacks. Although we evaluated our approach only in the context of CPS, CIMA is also applicable and useful for any system under the threat of memory-safety attacks.
From conceptual point of view, CIMA provides a fresh outlook over mitigating memory-safety attacks. In future, we plan to build upon our approach to understand the value of CIMA across a variety of systems beyond CPS. We also plan to leverage our CIMA approach for live patching. In particular, at its current state, CIMA does not automatically fix the memory-safety vulnerabilities of the victim code (although CIMA does ensure that the vulnerabilities are not exploited at runtime). Instead, it generates a report for the developer to help produce a patched version of the code. In future, we will use the report generated by CIMA to automatically identify the pattern of illegal memory accesses and fix the code accordingly. 
