Tech —

AMD’s moment of Zen: Finally, an architecture that can compete

Intel's architecture is still better, but AMD has significantly narrowed the gap.

Before the company's new Zen offerings, it's fair to say that AMD's last attempt at building a performance desktop processor was not tremendously successful.

The Bulldozer core released in 2011 had a design that can, at best, be described as idiosyncratic. AMD made three bets with Bulldozer: that general purpose workloads would become increasingly multithreaded, that floating point intensive workloads would become increasingly GPU-driven, and that it would be able to aggressively scale clock speeds.

Accordingly, AMD created processors with oodles of simultaneous threads, relatively long pipelines, relatively narrow pipelines, and relatively few floating point resources. The idea was that clock speed and the GPU would make up for the narrowness and lack of floating point capability. AMD hoped all those threads would be working hard.

Each Bulldozer module could run two threads simultaneously, with two independent integer pipelines and one shared floating point pipeline within a module. Desktop processors shipped with two, three, or four modules for four, six, or eight threads in total. Compared to Bulldozer's predecessor, the K10, each integer pipeline was narrow: two arithmetic logic units (ALUs) and two address generating units (AGUs), instead of three of each in K10. So, too, was the floating point pipeline, with two 128-bit fused multiply-add (FMA) units that could be paired together to perform a single 256-bit AVX FMA instruction. AMD designed the processor with a base clock speed goal of 4.4GHz.

Bulldozer? Bull💩, more like

None of AMD's gambles paid off. The high-end desktop parts, with their four modules and eight threads, had an abundance of integer threads. But most consumer workloads still can't be distributed evenly across eight threads. Single-threaded performance continues to matter a great deal. On the other hand, the sharing of the floating point units means that applications stuffed with floating point arithmetic have too few resources to work with. While GPU-based computing is important in certain workloads—such as scientific supercomputing—mainstream applications still require the CPU for floating point number crunching. Bulldozer leaves them short.

Even these issues might have been tolerable if the clock speed goals had been reached. A processor can get away with low instructions per cycle (IPC) if it runs at a high enough clock speed, but AMD came nowhere close to its 4.4GHz base goal. The top-end, four-module part had a base frequency of 3.6GHz. It could boost up to 4.2GHz under reduced workloads. This is a long way short of the design goal.

As a result, the first Bulldozer processors were in many workloads slower, yet more expensive, than their K10 predecessors. They were wholly uncompetitive with contemporaneous Intel parts.

AMD did iterate the design. The top-end second generation Bulldozer, named Piledriver, boosted the base clock up to 4.7GHz and up to 5.0GHz boosted. Combined with some internal improvements, this made it about 40 percent faster than the top Bulldozer. This came at a power cost, however: to hit those clock speeds, the processor drew 220W, compared to 125W for the Bulldozer.

The third-generation Steamroller made improvements to IPC and gained about nine percent over Piledriver. Fourth-generation Excavator added as much as 15-percent IPC over Steamroller. However, neither Steamroller nor Excavator were used in high-end desktop processors. The performance desktop space was ceded entirely to Intel.

AMD did use Steamroller and Excavator in some of its APUs—"Accelerated Processing Units," which is to say, CPUs with integrated GPUs. But even in this space, the Bulldozer family has proved limiting. Mobile-oriented APUs in the 10-25W space only have one Excavator module (two threads). Their performance is substantially lower than that of Intel's chips in the same power envelope, and Intel manages to squeeze four threads (albeit only on two cores) onto its low-power processors.

And to reach the ultra-low-power 3-7W space, AMD offers nothing at all with a Bulldozer-family core. The company has had chips operating in very low-power envelopes, but these have all used derivatives of the Bobcat core. This is a completely different processor design, developed for mobile and low-power operation. Bobcat derivatives are also used in the PlayStation 4, PlayStation 4 Pro, Xbox One, and Xbox One S.

By comparison, Intel's designs run the gamut (albeit with staggered release schedules); its Broadwell design ranges from two-core, four-thread mobile parts with a power draw of as little as 3.5W, up to 22-core, 44-thread server chips drawing 145W (or higher clocked 12-core, 24-thread parts drawing 160W).

Time for something new

By 2013, AMD had realized that Bulldozer was never going to be the processor that the company wanted it to be. A new architecture was necessary. AMD had a few particular goals for this: the new architecture had to be a viable challenger in the high-end desktop market, and it had to offer at least 40 percent better IPC than Excavator.

Like Intel before it, AMD wanted its new design to span the full range from fanless mobile through server and high-end desktop. So this improved IPC needs to be wedded to improved power efficiency. But AMD isn't giving up on the Bulldozer ideas completely: the company still believes that high numbers of multiple simultaneous threads are the future, and some of the design decisions suggest that AMD still sees GPUs as being central to serious floating point number crunching.

Four years in the making, the Zen core is the result of this new approach. And where Bulldozer failed to meet its objectives, AMD says that it has soundly beaten its 40-percent IPC improvement goal. In the single-threaded Cinebench R15 benchmark at a constant 3.4GHz, Zen achieves a score 58-percent higher than Excavator and 76-percent better than Piledriver. The typical IPC improvement, when compared to Excavator, is around 52 percent. It does this at significantly lower power draw, too: in multithreaded Cinebench R15, the performance per watt is more than double what it was for Piledriver.

Compared to the Bulldozer family, Zen is so much better across the board that it makes for an interesting—if uneven—competitor to what Intel is offering. Years have passed since AMD could even hope to be considered a performance rival to its much larger competitor, but with Zen, AMD finally has an architecture that can compete.

Channel Ars Technica