1. Home >
  2. Computing

AMD destroys Nvidia at Bitcoin mining, can the gap ever be bridged?

One of the questions that's swirled around Bitcoin mining performance for the past few years is why Nvidia GPUs are so thoroughly outclassed by AMD products. We dig into the question, test some CUDA optimized kernels, and discuss the particulars of GPU architectures -- and why AMD has such an overwhelming advantage.
By Joel Hruska
Nvidia logo

If you typically follow GPU performance as it related to gaming but have become curious about Bitcoin mining, you've probably noticed and been surprised by the fact that AMD GPUs are the uncontested performance leaders in the market. This is in stark contrast to the PC graphics business, where AMD's HD 7000 series has been playing a defensive game against Nvidia's GK104 / GeForce 600 family of products. In Bitcoin mining, the situation is almost completely reversed -- the Radeon 7970 is capable of 550MHash/second, while the GTX 680 is roughly 1/5 as fast.

There's an article at the Bitcoin Wiki that attempts to explain(Opens in a new window) the difference, but the original piece was written in 2010-2011 and hasn't been updated since. It refers to Fermi and AMD's VLIW architectures and implies that AMD's better performance is due to having far more shader cores than the equivalent Nvidia cards. This isn't quite accurate, and it doesn't explain why the GTX 680 is actually slower than the GTX 580 at BTC mining, despite having far more cores. This article is going to explain the difference, address whether or not better CUDA miners would dramatically shift the performance delta between AMD and Nvidia, and touch on whether or not Nvidia's GPGPU performance is generally comparable to AMD's these days.

Topics not discussed here include:
  • Bubbles
  • Investment opportunity
  • Whether or not ASICs, when they arrive next month, this summer, in the future will destroy the GPU mining market.
These are important questions, but they're not the focus of this article. We will discuss power efficiency and Mhash/watt to an extent, because these factors have an impact on comparing the mining performance of AMD vs. Nvidia.

The mechanics of mining

Bitcoin mining is a specific implementation of the SHA2-256 algorithm. One of the reasons AMD cards excel at mining is because the company's GPU's have a number of features that enhance their integer performance. This is actually something of an oddity; GPU workloads have historically been floating-point heavy because textures are stored in half (FP16) or full (FP32) precision.

The issue is made more confusing by the fact that when Nvidia started pushing CUDA, it emphasized password cracking as a major strength of its cards. It's true that GeForce GPUs, starting with G80, offered significantly higher cryptographic performance than CPUs -- but AMD's hardware now blows Nvidia's out of the water.(Opens in a new window)

The first reason AMD cards outperform their Nvidia counterparts in BTC mining (and the current Bitcoin entry does cover this(Opens in a new window)) is because the SHA-256 algorithm utilizes a 32-bit integer right rotate operation. This means that the integer value is shifted (explanation here(Opens in a new window)), but the missing bits are then re-attached to the value. In a right rotation, bits that fall off the right are reattached at the left. AMD GPUs can do this operation in a single step. Prior to the  launch of the GTX Titan, Nvidia GPUs required three steps -- two shifts and an add.

We say "prior to Titan," because one of the features Nvidia introduced with Compute Capability 3.5 (only supported on the GTX Titan and the Tesla K20/K20X) is a funnel shifter. The funnel shifter can combine operations, shrinking the 3-cycle penalty Nvidia significantly. We'll look at how much performance improves momentarily, because this isn't GK110's only improvement over GK104. GK110 is also capable of up to 64 32-bit integer shifts per SMX (Titan has 14 SMX's). GK104, in contrast, could only handle 32 integer shifts per SMX, and had just eight SMX blocks.

Kepler instruction capability

We've highlighted the 32-bit integer shift capability difference between CC 3.0 and CC 3.5.

AMD plays things close to the chest(Opens in a new window) when it comes to Graphics Core Next's (GCN) 32-bit integer capabilities, but the company has confirmed that GCN executes INT32 code at the same rate as double-precision floating point. This implies a theoretical peak int32 dispatch rate of 64 per clock per CU -- double GK104's base rate. AMD's other advantage, however, is the sheer number of Compute Units (CUs) that make up one GPU. The Titan, as we've said, has 14 SMX's, compared to the HD 7970's 32 CU's. Compute Unit / SMX's may be far more important than the total number of cores in these contexts.

Next page: Wrath of the Titan...

Mining performance

First, we'll look at the Titan's performance against the GTX 680 in an unoptimized openCL kernel (using cgminer 2.11(Opens in a new window)) and a more recent CUDA-optimized(Opens in a new window) kernel based on rpcminer. Rpcminer and cgminer share a common code base, performance between the two is identical when using OpenCL. For the unoptimized test, we opted for the poclbm kernel. The optimized test used a modified CUDA-capable kernel. This kernel was allowed to auto-configure for the Nvidia GeForce cards, but we also tested various manual settings for the number of threads and grid size. Hand-tuning these options failed to meaningfully improve performance.

The baseline testbed was an Intel Core i7-3770K with 8GB of RAM and an Asus P8Z77V-Deluxe motherboard with the Thermaltake 1275W 80 Plus Platinum power supply we reviewed last autumn. The AMD Radeon cards were all configured to use the diakgcn kernel. Performance and power consumption were logged over two hours, which gave the erratic CUDA miner's performance time to stabilize.

Typically in Bitcoin mining, the hash rate of a given card remains stable. The GTX 680 and Titan both "bounced" when running the CUDA miner, though the cause of the fluctuation is unclear. The performance figures for these cards reflect their average hash rate over time.

Nvidia BTC performance

The first thing to notice is that the Titan is much faster than simple increased core counts or clock speed would account for. Optimizing for CUDA improves performance on both cores by roughly 20%. Nvidia promised that GK110 would deliver significantly improved performance in mathematical workloads, and that fact is born out here as well. The size of the improvement between the two cards identical, at ~17%, which implies that Nvidia's driver can auto-optimize code to run on the GTX Titan.

GK110 is significantly faster than GK104, but look at what happens when we add Radeon performance data...

BTC performance

 

Ouch. The Radeon 7790, a $149 GPU, offers 80% of the GTX Titan's performance for 15% of its price. The Radeon 7970 is twice as fast at half the price. Even the CUDA-accelerated kernel doesn't bring Nvidia hardware into the same league as AMD's -- a point hammered home if we compare system power consumption. Keep in mind that the Titan is a 7.1B transistor GPU with a 561 mm sq die. The fact that the Radeon 7790 nearly matches its performance at 112 sq. mm and 2B transistors points to a fundamental bottleneck within Titan's architecture as the source of the problem.

The situation is just as lopsided if we consider GPU efficiency based on power consumption (MHash/Watt) or initial purchase price vs. hashrate, as shown below.

BTC power efficiency GPU price / performance ratio

A full discussion of GPGPU performance between AMD and Nvidia is beyond the scope of this article, but some performance checking is in order. The OpenCL-based Luxmark 2.0 benchmark is now running under the Titan, so let's see how performance compares there. Luxmark 2.0 now runs on a Titan GPU (when we first reviewed the card, the program crashed at launch). The GTX 680, GTX Titan, HD 7970, and HD 7790 are all shown below.

Luxmark 2.0

The Titan is a huge improvement to the GTX 680, but it's still half the performance of the HD 7970.

Next page: Can the gap be closed?

SiSoft Sandra now includes a number of financial transaction tests, some of which are designed to leverage the floating-point calculations where the Titan, theoretically, should excel.

The new financial tests in SiSoft Sandra 2013 are designed to measure "the metrics of a financial entity, be it a business, asset, option, etc. Here, various models are used to determine the future worth of "options" in organized option trading. An "option" is a contract to buy/sell an asset at a specified price ("strike price") at (or before) an expiration date... Mathematical models are employed to estimate option worth and are implemented in most financial or trading software; some are compute intensive, which is where GPGPU acceleration comes in."

SiSoft Sandra - binomial float

The GTX Titan isn't much faster than the GTX 680 in this test, possibly due to a need for further optimizations in these workloads. When we flip to 64-bit performance, the match-up changes.

Binomial Double/FP64 Sandra Everyone takes a performance hit, but the GTX Titan goes from less than half AMD's performance to only about 25% behind it. Other data sets, like these encryption performance estimates linked above and GPGPU performance in the CLBenchmark downloadable tests show the HD 7970 generally ahead of the GTX Titan in raw performance. Factor in die size or card price, and the HD 7970  is nearly always the better value.

Bitcoin a worst-case example of general trend

Radeon HD 7970, inverse angleThere are several reasons why this lopsided performance trend hasn't gotten more play. GPGPU performance is still in its infancy; games are still the go-to metric for consumer GPU comparisons; workstation applications fill a similar role in the professional space. Then there's the fact that Nvidia still owns the high-performance GPU computing space. AMD's efforts to ramp its own accelerated computing portfolio only began recently and have focused almost exclusively on consumer applications. That makes sense, given that APUs are the future of AMD's computing products, but it also means that the handful of people using GPGPU solutions are almost certainly using Nvidia hardware. Some 50-odd systems on the TOP500 use Nvidia accelerators compared to three wins for AMD. The relative performance differences between AMD's GCN architecture and Titan are interesting because they echo the marked performance differences we commonly see in the CPU market. When games were the only metric of interest, GPU performance depended solely on how well the graphics card's features mapped to DX standards and game engine demands. Our Bitcoin performance and OpenCL tests demonstrate that while the Titan crushes the HD 7970's performance in gaming, it can lag by up to 50% in other tests, despite a far larger transistor budget, more cores, and twice the price.

Can the gap be closed?

Earlier, we noted that the GTX 680 and GTX Titan had a tendency to "bounce" when benchmarked using a CUDA-optimized mining program. Even if we assume that the miner could be further improved to deliver peak performance at a constant rate, the GTX 680 would only reach 180MHash, while the GTX Titan topped out at 427MHash. $1000 for 427MHash/second is never going to be a good deal when two Radeon 7970's can be bought for the same price, for 2.2x the performance.

For now, we're betting that the high number of cores per SMX (192 for Kepler, 64 for GCN) is part of the problem. Each SMX has to work harder to extract sufficient parallelism to keep the entire processor block fed, which makes peak utilization problematic. Further CUDA optimizations might improve the overall performance scenario slightly, but there's no miracle kernel with 100% increased performance waiting in the wings. Even if there were, it would scarcely matter -- GK104's performance would need to quintuple for the GTX 680 to even be competitive.

Should you mine if you have an Nvidia card? You can, but be aware that power costs make this a losing proposition if Bitcoin prices decline to historic values. Even at $90 per BTC, and even with a Titan, mining efficiency barely breaks 1.2MHash/watt. Modern AMD cards backed up by efficient power supplies arem uch better, in the 2.2 - 2.5 range.

Now read: Bitcoin isn't illegal because it's not real money

Special thanks go to Adrian Silasi of SiSoft (makers of SiSoft Sandra(Opens in a new window)), who helped extensively with the analysis of this data and contributed some of the benchmark results we'll be discussing).

Tagged In

Graphics Graphics Cards CUDA Htx 680 Gpgpu

More from Computing

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of use(Opens in a new window) and Privacy Policy. You may unsubscribe from the newsletter at any time.
Thanks for Signing Up