This is huge. Really. —

Meltdown and Spectre: Here’s what Intel, Apple, Microsoft, others are doing about it

Intel, Microsoft, ARM, and others have responded. We dig in.

Meltdown and Spectre: Here’s what Intel, Apple, Microsoft, others are doing about it

The Meltdown and Spectre flaws—two related vulnerabilities that enable a wide range of information disclosure from every mainstream processor, with particularly severe flaws for Intel and some ARM chips—were originally revealed privately to chip companies, operating system developers, and cloud computing providers. That private disclosure was scheduled to become public some time next week, enabling these companies to develop (and, in the case of the cloud companies, deploy) suitable patches, workarounds, and mitigations.

With researchers figuring out one of the flaws ahead of that planned reveal, that schedule was abruptly brought forward, and the pair of vulnerabilities was publicly disclosed on Wednesday, prompting a rather disorderly set of responses from the companies involved.

There are three main groups of companies responding to the Meltdown and Spectre pair: processor companies, operating system companies, and cloud providers. Their reactions have been quite varied.

What Meltdown and Spectre do

A brief recap of the problem: modern processors perform speculative execution. To maximize performance, they try to execute instructions even before it is certain that those instructions need to be executed. For example, the processors will guess at which way a branch will be taken and execute instructions on the basis of that guess. If the guess is correct, great; the processor got some work done without having to wait to see if the branch was taken or not. If the guess is wrong, no big deal; the results are discarded and the processor resumes executing the correct side of the branch.

While this speculative execution does not alter program behavior at all, the Spectre and Meltdown research demonstrates that it perturbs the processor's state in detectable ways. This perturbation can be detected by carefully measuring how long it takes to perform certain operations. Using these timings, it's possible for one process to infer properties of data belonging to another process—or even the operating system kernel or virtual machine hypervisor.

This information leakage can be used directly; for example, a malicious JavaScript in a browser could steal passwords stored in the browser. It can also be used in tandem with other security flaws to increase their impact. Information leakage tends to undermine protections such as ASLR (address space layout randomization), so these flaws may enable effective exploitation of buffer overflows.

Meltdown, applicable to virtually every Intel chip made for many years, along with certain high-performance ARM designs, is the easier to exploit and enables any user program to read vast tracts of kernel data. The good news, such as it is, is that Meltdown also appears easier to robustly guard against. The flaw depends on the way that operating systems share memory between user programs and the kernel, and the solution—albeit a solution that carries some performance penalty—is to put an end to that sharing.

Spectre, applicable to chips from Intel, AMD, and ARM, and probably every other processor on the market that offers speculative execution, too, is more subtle. It encompasses a trick testing array bounds to read memory within a single process, which can be used to attack the integrity of virtual machines and sandboxes, and cross-process attacks using the processor's branch predictors (the hardware that guesses which side of a branch is taken and hence controls the speculative execution). Systemic fixes for some aspects of Spectre appear to have been developed, but protecting against the whole range of fixes will require modification (or at least recompilation) of at-risk programs.

Intel

Intel
So, on to the responses. Intel is the company most significantly affected by these problems. Spectre hits everyone, but Meltdown only hits Intel and ARM. Moreover, it only hits the highest performance ARM designs. For Intel, virtually every chip made for the last five, ten, and possibly even 20 years is vulnerable to Meltdown.

The company's initial statement, produced on Wednesday, was a masterpiece of obfuscation. It contains many statements that are technically true—for example, "these exploits do not have the potential to corrupt, modify, or delete data"—but utterly beside the point. Nobody claimed otherwise! The statement doesn't distinguish between Meltdown—a flaw that Intel's biggest competitor, AMD, appears to have dodged—and Spectre and, hence, fails to demonstrate the unequal impact on the different companies' products.

Follow-up material from Intel has been rather better. In particular, this whitepaper describing mitigation techniques and future processor changes to introduce anti-Spectre features appears sensible and accurate.

For the Spectre array bounds problem, Intel recommends inserting a serializing instruction (lfence is Intel's choice, though there are others) in code between testing array bounds and accessing the array. Serializing instructions prevent speculation: every instruction that appears before the serializing instruction must be completed before the serializing instruction can begin to execute. In this case, it means that the test of the array bounds must have been definitively calculated before the array is ever accessed; no speculative access to the array that assumes that the tests succeed is allowed.

Less clear is where these serializing instructions should be added. Intel says that heuristics can be developed to figure out the best places in a program to include them but warns that they probably shouldn't be used with every single array bounds test; the loss of speculative execution imposes too high a penalty. One imagines that perhaps array bounds that come from user data should be serialized and others left unaltered. This difficulty underscores the complexity of Spectre.

For the Spectre branch prediction attack, Intel is going to add new capabilities to its processors to alter the behavior of branch prediction. Interestingly, some existing processors that are already in customer systems are going to have these capabilities retrofitted via a microcode update. Future generation processors will also include the capabilities, with Intel promising a lower performance impact. There are three new capabilities in total: one to "restrict" certain kinds of branch prediction, one to prevent one HyperThread from influencing the branch predictor of the other HyperThread on the same core, and one to act as a kind of branch prediction "barrier" that prevents branches before the "barrier" from influencing branches after the barrier.

These new restrictions will need to be supported and used by operating systems; they won't be available to individual applications. Some systems appear to already have the microcode update; everyone else will have to wait for their system vendors to get their act together.

The ability to add this capability with a microcode update is interesting, and it suggests that the processors already had the ability to restrict or invalidate the branch predictor in some way—it was just never publicly documented or enabled. The capability likely exists for testing purposes.

Intel also suggests a way of representing certain branches in code with "return" instructions. Patches to enable this have already been contributed to the gcc compiler. Return instructions don't get branch predicted in the same way so aren't susceptible to the same information leak. However, it appears that they're not completely immune to branch predictor influence; a microcode update for Broadwell processors or newer is required to make this transformation a robust protection.

This approach would require every vulnerable application, operating system, and hypervisor to be recompiled.

For Meltdown, Intel is recommending the operating system level fix that first sparked interest and intrigue late last year. The company also says that future processors will contain some unspecified mitigation for the problem.

AMD

AMD's response has a lot less detail. AMD's chips aren't believed susceptible to the Meltdown flaw at all. The company also says (vaguely) that it should be less susceptible to the branch prediction attack.

The array bounds problem has, however, been demonstrated on AMD systems, and for that, AMD is suggesting a very different solution from that of Intel: specifically, operating system patches. It's not clear what these might be—while Intel released awful PR, it also produced a good whitepaper, whereas AMD so far has only offered PR—and the fact that it contradicts both Intel's (and, as we'll see later, ARM's) response is very peculiar.

AMD's behavior before this all went public was also rather suspect. AMD, like the other important companies in this field, was contacted privately by the researchers, and the intent was to keep all the details private until a coordinated release next week, in a bid to maximize the deployment of patches before revealing the problems. Generally that private contact is made on the condition that any embargo or non-disclosure agreement is honored.

It's true that AMD didn't actually reveal the details of the flaw before the embargo was up, but one of the company's developers came very close. Just after Christmas, an AMD developer contributed a Linux patch that excluded AMD chips from the Meltdown mitigation. In the note with that patch, the developer wrote, "The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault."

It was this specific information—that the flaw involved speculative attempts to access kernel data from user programs—that arguably led to researchers figuring out what the problem was. The message narrowed the search considerably, outlining the precise conditions required to trigger the flaw.

For a company operating under an embargo, with many different players attempting to synchronize and coordinate their updates, patches, whitepapers, and other information, this was a deeply unhelpful act. While there are certainly those in the security community that oppose this kind of information embargo and prefer to reveal any and all information at the earliest opportunity, given the rest of the industry's approach to these flaws, AMD's action seems, at the least, reckless.

ARM

The inside of the ExoKey, with its Atmel ARM-based CPU.
Enlarge / The inside of the ExoKey, with its Atmel ARM-based CPU.
ARM's response was the gold standard. Lots of technical detail in a whitepaper, but ARM chose to let that stand alone, without the misleading PR of Intel or the vague imprecision of AMD.

For the array bounds attack, ARM is introducing a new instruction that provides a speculation barrier; similar to Intel's serializing instructions, the new ARM instruction should be inserted between the test of array bounds and the array access itself. ARM even provides sample code to show this.

ARM doesn't have a generic approach for solving the branch prediction attack, and, unlike Intel, it doesn't appear to be developing any immediate solution. However, the company notes that many of its chips already have systems in place for invalidating or temporarily disabling the branch predictor and that operating systems should use that.

ARM's very latest high-performance design, the Cortex A-75, is also vulnerable to Meltdown attacks. The solution proposed is the same as Intel suggests and the same that Linux, Windows, and macOS are known to have implemented: change the memory mapping so that kernel memory mappings are no longer shared with user processes. ARM engineers have contributed patches to Linux to implement this for ARM chips.

Channel Ars Technica