Craig A. Hunter

Feb
02
 2018

iMac Pro 18-core Follow Up Review


I closed out my 10-core iMac Pro Review back in December wondering how an 18-core model would perform, and happily, one finally showed up for testing! As the two machines are otherwise very similar, refer back to that original review for a general overview of the iMac Pro (and background on my various benchmarks). Here, I am focusing on performance differences between the 10-core and 18-core models, as I suspect most buyers eyeing the 18-core juggernaut will be doing so for performance.

This new 18-core test unit has an Intel Xeon W-2195 CPU running at 2.3GHz (Turbo boost up to 4.3GHz) with a single 24.8MB L3 cache, 1MB L2 cache per core, 128GB of 2666MHz DDR4 ECC memory, a 4TB SSD, and an AMD Radeon Pro Vega 64 graphics chipset with 16GB of VRAM. For reference, the 10-core test unit has an Intel Xeon W-2155 CPU running at 3.0GHz (Turbo boost up to 4.5GHz), a single 13.8MB L3 cache, 1MB of L2 cache per core, 128GB of 2666MHz DDR4 ECC memory, a 2TB SSD, and the same AMD Radeon Pro Vega 64 graphics chipset with 16GB of VRAM. For the purposes of my tests, the only notable difference between these two machines is the CPU.

I want to start off by revisiting the computational fluid dynamics (CFD) benchmark I ran before, using NASA's USM3D flow solver. In that previous benchmark, the 10-core iMac Pro was the fastest system I'd tested by a good margin, but limited to 10 MPI processes on its 10 cores. Below, we can see that the 18-core iMac Pro is a teeny bit faster than the 10-core machine up to 10 cores, but then goes on to show a healthy increase in performance as it marches up to 18 MPI processes on 18-cores. The 18-core machine topped out at 62 GFLOPS of performance, a nice 27% increase over the 10-core machine's 49 GFLOPS. In terms of overall performance across the range, the two machines are really neck and neck up to 8 cores, at which point the 10-core machine starts to fall off the trend while the 18-core machine keeps plowing ahead.

CFD tends to be a good real-world benchmark because it's memory- and disk-intensive, and also because it involves a lot of inter-core communications (in this case, using MPI, which stands for "message passing interface"). It pushes the limits of a computer and shows how things scale as more and more cores are utilized and they begin to compete with each other for system resources, communication, and throughput. But it's not a highly optimizable computation, and it's not even close to being an embarrassingly parallel computation that scientists and engineers fantasize about. If this was a car test drive, running CFD would be like showing up at the dealership with three tons of gravel on a trailer and hitching it up. This is evident by looking at parallel scaling from the previous benchmark, where we can see that the 10-core iMac Pro only showed 6.7X benefit with 10 cores, and the 18-core machine only showed 8X benefit with 18 cores, both well off the ideal speedups we'd like to see:

So, I'd like to step through two other benchmarks, gradually moving through cases that are more easily optimized and parallelized, until we get to one of those embarassing cases. First up is the well known LINPACK benchmark, which solves a dense system of linear equations (by LU decompositon with partial pivoting, for you math nerds). Here, I ran the benchmark for a system of 15,000 equations on both 10-core and 18-core iMac Pro systems. I made use of Intel's Math Kernel Library (MKL) benchmark code, available here.

Again, we see that the 18-core machine is a teeny bit faster than its 10-core counterpart up to 10 cores, and then keeps on trucking past that point, building up a pretty big advantage by 18 cores. Here, the 18-core iMac Pro topped out at 686 GFLOPS, a healthy 54% higher than the 10-core machine's 445 GFLOPS. If we look at scaling for this case, we see it's a little better than the CFD benchmark, showing 8X benefit with 10 cores and 12X benefit with 18 cores.

Finally, let's look at a benchmark that truly is embarrassingly parallel. If you're lucky, you may get to run some of these types of computations in your line of work, but in my case, I have to make up a benchmark to have this much fun. For this case, I took the single-core AVX-512 vector-add benchmark from my original review and made a slight modification to split it up among multiple cores. As the benchmark runs easily on a single core, splitting that up over multiple cores is all gravy, and this becomes a good case to test the theoretical benefits of 18 cores over 10 cores. Results are shown below, and are so linear that I don't even need to bother showing scaling! Here, the machines again follow each other up to 10 cores, at which point the 10-core machine tops out at 233 GFLOPS. The 18-core machine goes on to reach 418 GFLOPS, a whopping 79% increase in performance.

So in summary, we see performance increases ranging from 27% to 79% for the 18-core iMac Pro when compared to the 10-core model. I suspect many computations and applictions will be in the middle of that range depending on how well they can take advantage of multiple cores, but there will certainly be some hot rod uses that get closer to that 79% end of the scale (and may do even better). Though I haven't mentioned it, if you look back through the various benchmark results, you'll see that the 18-core iMac Pro shows no disadvantage for single-core performance, despite running at a lower clock speed (2.3GHz/4.3GHz) than the 10-core iMac Pro (3.0GHz/4.5GHz). Often times, the price of scaling a CPU architecture to more cores is a loss of single-core performance, but no such penalty seems to exist here. The 18-core iMac Pro brings 8 more cores to the table on the high end with no loss of performance on the low end.

Based on pricing from Apple's website, it costs $1,600 more to go from a 10-core model to an 18-core model. My 10-core test unit prices out at $9,599, and a comparable 18-core unit would be $11,199, a 17% increase in cost. Those are both lofty prices for sure, but considering the performance increases we've seen here, the upcharge for the 18-core model is a bargain. If you're mainly interested in performance, looking at dollars per GFLOPS is a good way to compare the two machines, and confirms that the 18-core machine is indeed a better deal:


USM3D LINPACK VEC-ADD
10-core $196 $22 $41
18-core $181 $16 $27
$/GFLOPS for 10-core and 18-core iMac Pro models
based on the three benchmarks tested here

Whichever you choose, these new iMac Pro models are outstanding machines, and represent a great value for a true workstation class Mac.

———————

Theodolite
The original, indispensable, pioneering AR viewfinder.
Where will you take it on your next adventure?