Posted by: theovalich | October 24, 2008

Nvidia’s $50 card destroys ATI’s $500 one or “Why ATI sucks in Folding?”

As you might already know, I am a bit enthusiastic when it comes to distributed computing. I’ve been looking for aliens through SETI@home, later with BOINC… but then, Folding@Home showed up and I became an enthusiast for this valuable project from Stanford University. My family had some share of dealings with Alzheimer’s (aka AD) and Parkinson’s diseases (aka PD) and I won’t go here into what psychological and ultimately financial stress that families around the world, including my own – have to endure.
Folding@Home is also a project that pioneered the use of GPUs for distributed computing (if I am wrong on this one, feel free to correct me). Back in the summer of 2006, I heard that ATI and Stanford are working Folding@Home GPGPU client. I now remember my articles and articles from a lot of colleagues who all criticized Nvidia for not having a F@H client.

Nvidia's client may not look as nice as ATI one, but it's the efficiency that counts...

Nvidia's client may not look as nice as ATI one, but it's the efficiency that counts...

Fast forward to GTX280 launch and the Vijay Pande team debuted the Folding@Home client for Nvidia chips as well. Nvidia and ATI lead a short marketing war who can fold better and things went quiet… apparently, for a reason.
The reason why things went quiet is probably the “inconvenient truth”: ATI showed up with Radeon 4800 series and demolished Nvidia’s dominance in the segment, with GTX260 and 280 going through radical price drops in order to stay competitive. However, ATI’s Radeon 4800 series has one field where the card is losing against 5-10x cheaper cards: Folding@Home.
The 10x argument lies in comparison between current ATI’s flagship, the  Radeon 4870X2 and Nvidia’s GeForce 9600GSO. This $50 card can easily out-fold ATI Radeon 4870X2, which retails for more than 500 USD/450EUR in respective markets.
In the past weeks, I’ve conducted a series of tests with various graphics cards (all that I own or could put my hands on), and the results were quite depressing if you own an ATI card. I’ve asked some of my contacts in AMD why the performance is so bad and the answers were ranging from “we wanted to make best gamer’s card, not a card for Folding” to sad silence. It seems to me that the difference lies in shader type and clock: ATI’s R6xx and RV7xx architecture lies around big fat units and lot of tiny ones (64+256 in case of Radeon 3800, 80+720 in case of Radeon 4800), and the clock is much lower than in case with GeForce cards. At the same time, Nvidia went the other route and came up with large number of “fat” units, while the company didn’t even count the “thin” (MADD) ones.
When we compare the GTX280 and 4870X2, comparisons are just astounding: in a period of a month, EVGA’s GTX280 SSC achieved an average of 6,802 points per day, while ATI Radeon 4870X2 managed puny 3,870 ppd. At the same time, I’ve witnessed higher PPD scores achieved even by two-year old GeForce 8800GTS 640 MB, which was quite a surprise. Around two weeks ago, I started following PPD numbers using FahMon on a large number of systems that mostly bear the same configuration: dua-core processor or more, 2GB system memory or more and the graphics cards. In all cases, with the help of my friends, I’ve managed to check FahMon and KakaoStats for rougly 25 cards and came to a surprising result.
With the recent update to the GPU2 client and new Fah_Core11.exe (ATI uses v1.17, Nvidia v1.15), the community witnessed further fall in number of completed packets per day. If you’re not familiar with Folding@Home packets, every package features certain number of mathematical simulations for tested protein – in case of Nvidia, packet consists out of 25 million, while ATI’s one features 10 million operations. However, due do different type of mathematical operations, Nvidia’s packet usually will result in 480 points, while ATI’s 10 million will return 548 points (or recently introduced ATI packets with 338 points).
Like I previously wrote, the table below is not the result of one packet score and Excel calculation, but rather continuous number crunching over the course of several weeks, with one week used for measurement.


Improvised Top 20 Folding@Home GPUs:

  1. Nvidia GeForce GTX280 1GB (EVGA SSC)
  2. Nvidia GeForce GTX260-216 898MB (EVGA SSC)
  3. Nvidia GeForce GTX260 898MB (EVGA Superclocked)
  4. Nvidia GeForce 9800GTX+ 512MB (ASUS TOP)
  5. Nvidia Quadro FX 4600 SDI 768MB (PNY)
  6. Nvidia GeForce 9800GTX 512MB (ASUS TOP)
  7. Nvidia GeForce 8800GTX 768MB (Zotac AMP! Edition)
  8. Nvidia GeForce 8800Ultra 768MB (XFX XXX Edition)
  9. Nvidia GeForce 8800GTS 512MB (Gainward)
  10. Nvidia GeForce 8800GT 512MB (Gainward)
  11. Nvidia GeForce 9600GSO 768MB (EVGA)
  12. Nvidia GeForce 8800GTS 640MB (LeadTek)
  13. ATI Radeon 4870X2 2GB (PowerColor)
  14. ATI Radeon 4870 512MB (PALIT)
  15. Nvidia GeForce 9600GT 256MB (Zotac)
  16. ATI Radeon 4850 512MB (PALIT)
  17. ATI Radeon 3870 512MB (Sapphire Atomic)
  18. ATI FireGL V8600 1GB (ATI)
  19. Nvidia GeForce 8600GTS 256MB (XFX XXX Edition)
  20. ATI Radeon 3850 256MB (Sapphire)

This is not a complete table by no means, since I am missing several new GPUs. But in this one, as you can see for yourself – results are quite dramatic for the red team. Two year old GeForce GPUs demolished otherwise-brilliant Radeon series, and it is incredible that even GeForce 9600 will outfold Radeon 4850. This is a rude wake-up call for guys at Markham, because this is just unbelievable.
Personally, I am running a combination of AMD Spider platform (9850BE + 790GX + ATI Radeon 4870X2) and hybrid Intel’s V8-Skulltrail platform with Quadro FX 4600 SDI.
Of course, everything can be changed with a simple driver update. I don’t understand what happened with AMD/ATI, company that lead the field of GPGPU computing for so long – why should AMD work on optimizing Folding@Home client… I am aware that AMD poached Mike Houston from Stanford to work on Brooke+ and now OpenCL APIs, but surely the performance didn’t went downhill from the influence of just one person. Or just maybe…
Overall, I hope that Catalyst 8.11 or 8.12 will bring more performance for ATI cards, since I do not believe that it would be so hard to optimize drivers for GPGPU/GPU Computing usage. For now, in Folding@Home, ATI is complete washout.

For the end of this article, if you find that your GPU cycles could be used for something good, I invite you to read the following article and join F@H family, regardless of what client (CPU or GPU) or team you choose in the end. Intel, AMD, ATI, Nvidia, Windows, Linux or Mac OS – it does not matter, just join – If you want, of course.

About these ads

Responses

  1. Excellent article colleagues, I hope that will soon emerge gpu client for Linux, and only then we will see how to behave Linux vs. Windows platform.

    Linux CPU Folding gives better results than the windows Folding on CPU, I hope the same story with the GPU client.

    Boris

  2. In the case of Nvidia, that depends on development of CUDA for Linux platform. In the case of ATI, that depends on Brook+. Ideally, once that OpenCL starts rolling in, we’ll have wider choice of operating systems.

    So far, owners of Macs are limited to CPUs only, yet alone GPU. That’s the next big step – getting GPU on alternative OS’es. We’ll see how far Stanford can stretch out.

  3. [...] really didn’t know already. It is pretty funny that the 9600GSO can out-fold cards 10x it’s price. Nvidia’s $50 card destroys ATI’s $500 one or “Why ATI sucks in Folding?” The… __________________ E6600 @ 2.7 | 9600GT 512MB | AB9 Pro | 3.325Gb Corsair Value DDR2 @600MHz [...]

  4. You left off the most powerful folding card EVER:

    9800GX2

  5. Well, sadly, my ASUS EN9800GX2 went the way of do-do birds, with corrupted textures in Crysis/C: Warhead and crashes in Folding@Home.

    But even during its normal life span, it just didn’t deliver 2 GPU experience and double the Folding rate of 9800GTX. Like I wrote in the article, this was solely based on cards I own(ed). I was shocked at the results my 4870X2 was achieving, and I am now seriously considering ditching AMD Spider as my primary platform and promoting my Intel+Nvidia testbed as the main one.

    I’ll hold judgement until I receive an answer from ATI.

  6. Should have had overclocking results too, i.e. how much gains one would expect in PPD by overclocking the GPU, so instead of going out and buying an 8800GT, they can just OC the shit out of their 8800GS.

  7. I also fold on a 9800gx2. It is good for like 10k ppd. Considering how far they have dropped in price, they are a much better buy than a gtx280 for folding. You can get Bstock evga ones for like $230. From what I have seen the GTX2xx series is not that great of a folder compared to even a 9800gtx.

  8. Theovalich, you should come chill at xcpus.com you’re a beast for spending the time putting together a nasty set of information like this.

    Good job man, this is usualful stuff.

    • ATI or AMD now just seems to get worse and worse. I sell video cards on eBay and 80% of my sales and now nvidia.

  9. Theo – if you were not getting 10,000 PPD out of your 9800GX2, then you simply didn’t have it setup right.

    I have 3 of these cards folding, and have folded on the 9800GTX, the 8800GT, and the 9600GSO, to name a few, and nothing at all compares to it. It’s a monster.

  10. Guys, thanks for all the comments, I will try to answer them all, feel free to comment:

    @AFROWUT: This is exactly something I am working on. Stay tuned, but you should see a “GPU Overclocking for F@H” article coming in November.

    @logain: While 9800GX2 is an excellent folder, I had some issues with this card, and it is currently in RMA procedure. I wanted to include it, but it seems to me like I got a dud. Will do an update as soon as I can ;-)

    @Lionhardt: Thanks for the invite, I will join your forum as soon as I find some time – constant travelling and preparation for the launch just takes time – plus, I will do an updated table with PPD measurements as soon as I land in San Fran.

    @BP: Well, the board is now in RMA – green dots appearing in Crysis Warhead are good enough reason for alert.

    I am folding with 4870X2 and 9800GTX, I gave away a lot of cards to my friends for folding – including GTX280 that is now folding 8600 PPD (EVGA SSC).

  11. for some people who are here post opinions, and in another forum
    write about how Theo fold only 2 days and suspect that does not tested this cards…

    here is a little better stats:

    http://kakaostats.com/t.php?t=69864

    I see that fold very much Theo, as opposed to that link, where someone realized something wrong:

    http://folding.extremeoverclocking.com/user_summary.php?s=&u=388961

  12. Well, I saw the responses in some forums, and well, you cannot write something without causing a reaction.

    This story was written in good faith, but first and foremost, to find out why my ATI Radeon 4870X2 sucks so bad, when compared to Boris’es 9800GX2 and Ivan’s GTX280. The result was well, surprising. I started digging and found every possible card I got and run benchmarks through FahMon.

    If that is not good enough, I am sorry. But for Monday, expect an update on the situation :)

    Of course, Monday Pacific time, since I am in California – evening in Europe. Thanks for all the visits, didn’t exactly expect that around 1000 people will read this story… :D

  13. Theo, get a email address up. Something to be said in private.

  14. [...] Folding performance explained, future development revealed Following the article about Top graphics cards for Folding@Home, it seems that I managed to get some doors opened and receive answers  from the people closely [...]

  15. Hello Theo, I have read articles from you at theinquirer and I absolutely love it.

    But regarding ATI performance in Folding@Home I have a question. If you see client statistics and divide TFLOPS by active CPU’s you have:

    Client statistics by OS

    OS Type TFLOPS* Active CPUs GFLOPS/CPU
    ATI GPU 481 4373 109,99
    NVIDIA GPU 1769 16084 109,98
    PLAYSTATION®3 1699 60257 28,19

    Last updated at Mon, 03 Nov 2008 05:04:10
    DB date 2008-11-03 05:00:02

    So for ATI we have 109,99 GFLOPS/GPU and for Nvidia 109,98: there is no real difference here. How can this be explained if Nvidia Hardware is much faster than ATI? Maybe I don’t fully understand it so please correct me if necessary.

    Best regards!

  16. [...] As you might already know, I am a bit enthusiastic when it comes to distributed computing. I’ve been looking for aliens through SETI@home, later with BOINC… but then, Folding@Home showed up and I became an enthusiast for this valuable project from Stanford University. My family had some share of dealings with Alzheimer’s (aka AD) and Parkinson’s diseases (aka PD) and I won’t go here into what psychological and ultimately financial stress that families around the world, including my own – have to endure. Folding@Home is also a project that pioneered the use of GPUs for distributed computing (if I am wrong on this one, feel free to correct me). Back in the summer of 2006, I heard that ATI and Stanford are working Folding@Home GPGPU client. I now remember my articles and articles from a lot of colleagues who all criticized Nvidia for not having a F@H client. Link: Here [...]

  17. [...] the article about Top graphics cards for Folding@Home, it seems that I managed to get some doors opened and receive answers from the people closely [...]

  18. This article is TERRIBLE. If you are serious about folding then you you would realize that the nvidia are SMALLER atoms, while ATI clients are folding with LARGER atoms. As of November 10, new larger atoms for nvidia is released and has put the points per day in line with ATI. This is the reason why people who uses folding at home as benchmark for gpu performance are IDIOTS.

  19. While I value your opinion, showing up anonymously and not giving any real contribution to the discussion is something that I don’t appreciate.

    It is true that as of 11/10, Nvidia hardware is folding similar packets as ATI ones (10M WU), but the PPD is still much higher than ATI.

    Like I stated, I am using ATI Radeon 4870X2, GeForce GTX 280 and I managed to get a hold of 9800GX2. Even with new packets, GTX280 will have at least 2000 PPD higher than ATI. Nv will score 511 points per this new packet, while ATI scores 548 all the time.

  20. Just so everyone knows, points do not mean scientific value, output, or anything for that matter.

    nvidia has (was) folding small proteins which did not even fully utilize the ati cards that they were benched on.

    nvidia’s fewer, higher speed shaders are much better at smaller proteins because the shaders can process a small amount each at great speed.

    however, ati cards (with about 4X the shaders at half the speed of nvidias’) are better on huge protiens with enough atoms to actually tap into the additional power.

    Also, lets not forget the optimization differences between the two and the numerous problems on nvidia’s side that almost completely prevent certain cards from folding.

    more fixes, tweaks, and most importantly larger protiens, will allow ati to come closer to nvidia while nvidia gets more stable.

  21. Well it only makes sense the ATI card gets owned.

    You remember the fanfare nVidia made about CUDA and the extra double-precision units they put into the GTX 280? They worked more on general computing for the GTX series than they did for graphics, which is why we have a huge die that doesn’t increase performance as drastically as you would think.

  22. Well, there is a two-fold approach to the matter. ATI does not feature hardware FP64 DP units and FP64 DP runs at 1/4th the speed. So, 1TFLOPS chip will get you 250 TFLOPS. If all things go ideally.

    Every 8 shader pack in nV comes with 1FP64 FP unit, and the transistor cost for that was hellish. And yet, nV is able to do 1:8th of the speed (logical, given the number of DP units), e.g. their 933 GFLOPS chip gives out 150 GFLOPS.
    But, it always gives 150 GFLOPS, and that is added by these 933 GFLOPS, so nV goes from 933 GFLOPS to 1.1 TFLOPS.

    And in worst-case scenario, nV still outputs more GFLOPS that ATI. ATI will fix that and become more and more improved, but they’re only in their 1st gen of GPGPU-friendly hardware.

    But things will only get better for both from here.

  23. Right now there is a bug in the ATI Vista config that does not allow running of two GPU2 clients simultaneously with 4870×2. You should go into that a bit, otherwise the article sounds fanboyish, though I know this is not your intention.

    I run a couple of 4850s overclocked to 4870 speeds and they are pulling an average of 3500PPD each, and I’m running one 4870 at 775Mhz which pulls in closer to 4000PPD.

    I am not sure why double precision FP64 units can do for folding though, but I guess we will find out.

  24. [...] potrebbero ancora starle davanti per maggiori info sul folding su gpu ati e nvidia, guarda qui: Nvidia’s $50 card destroys ATI’s $500 one or “Why ATI sucks in Folding?” The… AMD’s Folding performance explained, future development revealed Theo’s Bright Side Of [...]

  25. 1st of all, HD4800 familly don’t have 80 fat + 720 thin one shaders. The one SP consist 1 fat (which can FP64, trigonometrics etc) and 4 thin (MADD) SPUs – so it DOES HAVE HARDWARE FP64 capabilities. DP perfomance is 1/5 of SP – so 240GFLOPS on RV770XT not PRO. RV770 does have 160 fat one SPUs and 640 thin ones. Why is then ATI getting lower score? Because just like in video transcoding – in folding@home more importand is speed of one SP, not count of them. That’s why 9800GX2 (128×2 SP high speed) is much faster than GTX280 (240 SP at medium speed) and even HD4870X2 (160SPx2 at low speed). Those thin units in ati cards + fat one works as one big shader unit, you can’t adress them separately so can’t be counted as independent units (so those 800 shaders is just marketing bull shit). So now…if folding project uses all 160SPs in RV770 but only fat ones we get 1/5 theoretical performance while with nvidia cards using all available SPs is always top performance. ATI chips calculate very complex, highly parallelized tasks better than nv, but nv calculates smaller and even higher parallelized tasks MUCH faster than ati (because such tasts can’t saturate ati SPs).
    Got it?

  26. “ou can’t adress them separately so can’t be counted as independent units”

    Looking at RV770 architecture diagram, you can easily see 10 groups of 80 shaders. The problem in F@H performance lies in the fact that Radeon 4800 is the first chip to support local memory store, and their F@H client was a victim of supporting non-GPGPU friendly chips such as 3000, 2000 and X1K series.

    When an application is written to use the local memory store in 4800 series, it “kicks ass and chew bubble-gum”, just like Russians proved with ElcomSoft’s WPA 1.0 application, where a single 4870 beats GTX280 by almost 50%.

    You are right on the F@H performance, though, we are getting absolute minimum performance out of ATI cards. When Pande Group releases F@H based on OpenCL, not on ATi’s flawed Stream, we will have something to write about.

  27. “Looking at RV770 architecture diagram, you can easily see 10 groups of 80 shaders”
    Not SHADERS, SHADER PROCESING UNITS (SPUs), there are only 16 real shaders (SP) in each group!
    It’s like a 2 lane highway with speed limit 300mph on nvidia or 6 lane highway with speed limit to 150mph. What is better? For a single car the 1st one, for 100 cars? The second one. Easy to understand. ATI shaders aren’t as fast as nvidia ones but can (if well optimized) process much more data.

  28. Interesting article but I do find the article completely fanboyish.. First off I would never spend 500 bucks on a video card just so I can do folding… The answer you got from AMD is correct.. Nobody cares about folding even if they use such programs such as seti.. AMD’s target market is not a bunch of geeks looking for aliens, its target audience is Gamers and HD HTPC owners.. And in Gaming and HD playback the Ati 4870 x2 spanks Nvidia’s video cards…

    And then you come to image quality and 2D rendering with programs such as Photoshop and Illustrator.. Nvidia video cards are about 35-350 percent slower than Ati video cards in 2D rendering tasks.. In fact when working with project files over 1 gb in illustrator or photoshop the Nvidia video cards cause extreme 2D rendering lag.. The Color fedelity in Nvidia’s video cards are also severely lacking compared to Ati Boards.. You can not get true or near true color representation on Nvidia video cards…

    It’s a driver issue Nvidia has never bothered to address in either its old or new video cards.. Some older Nvidia cards do not have this issue.. But all the new ones do..

  29. btw when I mean some of their older cards don’t have that 2D issue, I am talking TNT old…

  30. [...] [...]

  31. [...] 9.3: ATI now loves Folding@Home Back in late October, I wrote a piece where I took some 24 cards and did a test run using couple identical work units and saw that a $49.99 GeForce 9600 GSO (now known as the GeForce GT130 inside those new Mac Pros) [...]

  32. Ill just continue to fold, who gives a rats ass how fast it is. ;-)

  33. [...] [...]

  34. [...] http://theovalich.wordpress.com/2008…olding-at-hom/ Best guide ever…sort of XD __________________ [...]

  35. Well I only buy cards that fold. I have 3 @ 9600GSO, 2 @ 8800GS, 1 @ gtx260, 1 @ gtx465 and my old almost useless Radeon 1950Pro. They set me back about $2000 over the coarse of 3 years. There are about 100K folding donors in the world. If it were me..I would market/develop with me in mind.

  36. A couple years ago I was an admin at the New Egg forum. Nvidia took part in a “town meeting” style thread. At that time ATI was the only GPU that folded so I challenged them to do something with theirs as well and “shut up the ATI fanboys”. I’m not going to take credit but it may have influenced a decision they were already making. (I believe once if you folded an ATI you voided the warranty) It seemed strange there was very little response for a long time from Radeon

  37. Speaking as the guy who wrote the NVIDIA Folding@Home CUDA kernels back in 2007-2008, the reason a $50 NVIDIA card destroyed ATI’s high-end was that CUDA is a fundamentally better approach to harnessing GPU firepower than Brook. Brook is trapped wrapping itself around rendering its computations while CUDA allows one to treat the SMs of a GPU as tiny independent CPUs and set them on their merry way asynchronously and independently computing a final result without any need for synchronization, something which *hammers* Brook.

    Fast forward 3 years and my more recent work just set the world’s record for several molecular dynamics benchmarks in AMBER. And I remain mystified as to how in the age of OpenCL that AMD has allowed NVIDIA to retain such a commanding lead for so long because the same features that made CUDA clobber Brook in 2008 are the ones that make NVIDIA GPUs continue to be the better platform for most GPGPU applications.

  38. [...] [...]

  39. A excellent article. Thank you!

  40. Thanks for a marvelous posting! I definitely enjoyed reading it, you could be a great
    author.I will always bookmark your blog and will eventually come back someday.
    I want to encourage you to continue your great writing,
    have a nice day!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: