attackers need only find a single entry point out of an exponentially growing number of options.
The root cause of the MeltdownSpectre issue is a failed abstraction. Early computers were imagined to be state machines, where the computer's current state is held in a register and logic computes the next state from the current one. When originally conceived, state machines changed state on the rising edge of a clock, which was usually driven by a precise oscillator. The 68000 fit this model pretty well, as shown by the three instructionsload, add, and store-in Figure 1a . Each 68000 instruction used a specific number of clock cycles, illustrated as two, and the clock was precise, so the rate of instruction processing was also precise. It was so obvious that the instruction processing time carried no information that the community stopped thinking about the issue, setting the stage for a surprise several decades later.
The state machine abstraction became more complicated as computer architecture advanced. Faster transistors allowed a processor's internal clock rate to increase substantially, but the memory bus couldn't keep up. The fix was a cache memory that stored some data on the processor chip, where it could be accessed quickly; going off-chip to access real memory was only necessary in cases of a "cache miss." This led to the situation shown in Figure 1b , where the length of time required to load data from memory depended on whether the data was in cache or not.
But do microprocessors still satisfy the state machine abstraction? In the historical rush to keep up with Moore's law, the answer was "absolutely yes." Caches led to two versions of the state machine abstraction, and gave architects a new degree of freedom to slow down one state machine to get more work from the other. In fact, the computer architecture community made a big deal out of the relative speeds of the two state machines. The cache miss rate could be reduced through improved hardware design, yielding more performance. While Figure 1b shows a load instruction taking either two or three clock cycles, computer architects ultimately represented load time as a real number comprised of the average of the two possibilities weighted by the cache hit-versus-miss rate.
Reanalysis of the issue in the last year led some out-of-the-box thinkers to take a different view. Although a single real number was convenient, in reality, it was a mixture of two discrete values, giving it attributes of a random number. A random number has a mean and standard deviation, making it look noisy if graphed over time. Further out-of-the-box thinking revealed that the purportedly random number is somewhat predictable, based on the layout of data in memory affecting whether a miss occurs. Once Meltdown and Spectre were disclosed to the public, computer engineers instantaneously changed their view of the amount of time required to execute an instruction on a complex processor. It's now a base time duration with a superimposed noisy signal that might carry information about internal processor state. This makes the hit-versus-miss ratio a covert "side channel," in computer security lingo.
Most modern processors use speculative execution to improve the cache's hit-versus-miss ratio and hence performance. Yet we now discover that most implementations of speculation allow data in otherwise inaccessible memory to change a subsequent cache hit into a miss-or vice versa-creating an opportunity for adversaries to craft malware that exports sensitive data to a lower-security environment.
The specific cause of Meltdown and Spectre was a timing side channel, but the root cause can be traced back to the state machine abstraction splitting into two abstractions without enough attention to unintended consequences. As we'll discuss below, Meltdown and Microprocessors follow the state machine abstraction at two levels: (a) state machines at one level advance at the rate of the system clock, while (b) state machines at the other level advance at the rate of instruction processing. Before the advent of cache memory, the two state machines advanced in time at a fixed rate relative to each other; however, the rate varies in today's complex processors and becomes a side channel that can convey sensitive data to hackers.
REBOOTING COMPUTING
Spectre can be addressed in future chips through design changes; addressing the root cause in complexity will require deeper thinking.
FIXING MELTDOWN AND SPECTRE
Different groups around the world are busy finding the most efficient way to address the Meltdown and Spectre security bugs. For example, one of the authors, Tom Conte at Georgia Tech, suggests a deterministic VLIW approach. Today's caches decide whether to use on-chip cache or off-chip main memory at execution time. The current cache approach could be replaced by temporary registers and a more sophisticated compiler that essentially precomputes the cache misses. Because the compiler runs long before sensitive information is on the scene, the cache hit-versus-miss decision can't depend on it. Fine-grained memory protection is another possibility. The discussion above showed how variance in the rate of instruction processing can carry information, but this only becomes a security problem if it allows access to sensitive information. Microarchitectural changes to the processor and additional memory tagging, called finegrained memory protection, applied to both prefetching and speculative execution would allow the OS to block the side channel in places where it could convey sensitive data. Although microarchitectural changes are costly to implement, they would also make programs substantially more resilient to other bugs and exploits, such as stack or buffer overflows.
STEMMING THE GROWTH OF COMPLEXITY
The two methods just described might be good options that deserve to be implemented, but they have a common vulnerability: when a system that's too complex to understand develops undesired behavior, adding a slew of patches makes the system even more complex, risking the introduction of additional undesired behaviors due to the increased complexity.
It's possible to create computer systems with enormous numbers of transistors that still function securely through the mundane "air gap" method. The design of a data center illustrates this principle. Data centers typically have many servers with just one application per server, turning off all unnecessary TCP/IP ports. This might lead to one server for email, one for passwords, one for the database, and so on. The air gap comprises inches or feet of space between servers, isolating the servers on either side in a way that's not subject to the exponential complexity growth found in microprocessor architectures.
Air gaps aren't perfect barriers, but the method of detecting and eliminating them is very different from microarchitectural vulnerabilities. For example, an air gap was recently breached by malware flashing the activity light on a hard drive while the attacker stationed a drone carrying a video camera outside a window to record the flashing pattern. 3 These types of attacks often have simple remedies because of their inherently physical origin-like closing the curtain over the window. Just like air gaps or other hardware barriers, we can partition systems in software by running an OS instance on individual cores and having them communicate with message passing, such as Barrelfish 4 did nine years ago.
If performance is an issue, the message passing can be implemented using shared memory. The same is true for tight communication channels between user space and kernels, which is another choke point preventing information leaks. Similar approaches are being adopted to partition OSs' memory into different degrees of trust. This approach has always been the basis of object-based systems, which, if supported by hardware similar to capabilities and call gates, can provide access to data only through well-defined interfaces while protecting the inner data of an object.
Partitioning is becoming popular, although not in computers readily accessible to most programmers. Mobile devices achieve high performance with long battery life by having many A potential remedy for Meltdown-Spectre class issues: replace today's automatically operating cache with one created by the compiler using temporary registers. Using the same load-add-store sequence from Figure 1 , the compiler could put a value in a temporary register and (a) load it in one cycle or (b) reorder instructions so that the "other" instruction does useful work while leaving time for a load from memory. This approach should be about as fast as current methods, but the instruction rate won't depend on data because just one state machine abstraction is in use.
chips specialized to specific functions separated by air gaps-just like the servers in a data center. Software can also add noise to decrease the signal-to-noise ratio and make it more difficult for hackers to extract sensitive information. For example, OSs randomly map virtual pages to physical page locations to prevent adversaries from knowing where sensitive data resides. Random time delays can be effective at higher levels of software as well, such as was done for Spectre in JavaScript interpreters in Google's Chrome and Mozilla's Firefox browsers.
T
here's an expectation that computers should be secure, but is this realistic? We increasingly view computers as artificial intelligences, expecting them to keep secrets just like humans. But humans aren't particularly good at keeping secrets. Poker, for example, is based on players developing the skill to see into another person's mind by using the cues that people emit but don't pay attention to-just like the timing side channels exploited by Meltdown and Spectre. Furthermore, humans can be tortured, polygraphed, and subject to drugs, vulnerabilities analogous to those available for analyzing computers when physical access is possible. So while it's clear that security must be a top-level requirement when designing future architectures, computer security will remain an ongoing challenge.
