Single threaded scaling is back —

Intel unveils a new architecture for 2019: Sunny Cove

Finally, a move away from just bundling more cores together.

OK, it's not all that sunny, but it's a nice picture of a cove.
OK, it's not all that sunny, but it's a nice picture of a cove.

In 2019, Intel will release Core and Xeon chips built around a new architecture: the chips will add a bunch of new instructions to accelerate certain popular workloads such as cryptography and compression, with the company demonstrating 75-percent improvement in compression performance relative to prior-generation parts.

Since 2015, Intel's mainstream processors under the Core and Xeon brands have been based around the Skylake architecture. Intel's original intent was to release Skylake on its 14nm manufacturing process and then follow that up with Cannon Lake on its 10nm process. Cannon Lake would add a handful of new features (it includes more AVX instructions, for example) but otherwise be broadly the same as Skylake.

However, delays in getting its 10nm manufacturing process running effectively forced Intel to stick with 14nm for longer than anticipated. Accordingly, the company followed Skylake (with its maximum of four cores in consumer systems) with Kaby Lake (with higher clock speeds and much greater hardware acceleration of modern video codecs), Coffee Lake (as many as eight cores), and Whiskey Lake (improved integrated chipset). The core Skylake architecture was unchanged across these variations, meaning that while their clock speeds differ, the number of instructions per cycle (IPC) is essentially identical.

Looking on the sunny side of 10nm

Intel says that Sunny Cove, by contrast, is an enhanced microarchitecture to be built on the company's 10nm process. While still derived from Skylake, it has been improved to execute more instructions in parallel and with lower latency, and certain buffers and caches have also been enlarged. The level 1 data cache is 50 percent larger than in Skylake, as is the cache for decoded micro-ops and the level 2 cache (with the exact size depending on market positioning). Where Skylake has two reservation stations dispatching instructions across eight ports with a maximum of four instructions dispatched per cycle, Sunny Cove has four reservation stations, ten ports, and up to five instructions per cycle. The execution units have also been reorganized slightly, with Sunny Cove having two extra units capable of handling LEA instructions (a very versatile x86 instruction that can perform various arithmetic operations, as well as calculating memory addresses), and another for vector shuffles. This should give the out-of-order machinery more options as to how it can schedule instructions and, hence, extract greater parallelism.

Where Skylake can perform two loads and one store per cycle, Sunny Cove ups this to two loads and two stores. The reorder buffer is larger, enabling more out-of-order instructions in flight, and the load and store buffers are also larger, enabling more in-flight memory operations.

Like the oddball Cannon Lake processor that's built on 10nm and shipping in limited quantities, Sunny Cove includes support for AVX-512 instructions. AVX-512 spans many different extensions and capabilities; some are general-purpose vector arithmetic, others are specialized for workloads such as neural networks. In addition to these, Sunny Cove will include new instructions for accelerating encryption and data compression workloads—it's these new instructions that are responsible for the 75-percent performance improvement.

Petabytes of RAM

Sunny Cove also makes the first major change to x64 virtual memory support since AMD introduced its x86-64 64-bit extension to x86 in 2003. Although the virtual memory addresses used on these systems take 64 bits to store, they only actually contain 48 useful bits of information. Bits 0 through 47 are used, with the top 16 bits, 48 through 63, all copies of bit 47. This limits virtual address space to 256TB. These virtual addresses are mapped to physical addresses using a page table structure with four levels, with physical memory addresses also limited to 48 bits. This means that these systems can support a maximum of 256TB of physical memory.

Both Intel and AMD have shared these limits since 2003. No longer: Sunny Cove extends virtual addresses to 57 meaningful bits (with the top 7 bits again either all zeroes or all ones, copying bit 56), with physical memory addresses of up to 52 bits. To handle this requires a fifth level in the page table. The new limits enable 128PB of virtual address space and 4PB of physical memory.

The various iterations of Skylake have given us improved clock speeds and ever-larger core counts. What they haven't done, however, is improve the IPC of single-threaded code. For the first time since 2015, that's what Sunny Cove will do, making every workload faster, not merely those that can spread to ever-larger numbers of threads.

Intel is promising Core-branded Sunny Cove CPUs in the second half of 2019. In 2020 this will be followed by Willow Cove, a Sunny Cove with a redesigned cache, new security features, and new transistor optimization. In 2021, the company will release Golden Cove, again with more security features but also promising improved single-threaded performance, better machine-learning performance, and better networking and 5G performance.

Sunny Cove is also coming to Xeon. The roadmap here is vaguer—Intel doesn't offer any dates—but will see Cascade Lake in the earlier part of 2019, bringing with it some new AVX-512 instructions for neural networks and as many as 48 cores. This will be followed by Cooper Lake, which will include support for bfloat16 data—a reduced precision floating-point format that's used in neural networks. This will be followed by Sunny Cove in its Xeon guise: Ice Lake. A "next-gen" processor will follow from there.

Channel Ars Technica