As “Theo’s Bright Side of IT” turns a century (100 stories) after 5 weeks of existence, it would be right to write an article about technology that is set to become an everyday word during the next couple of years: GDDR5.
This memory standard will become a pervasive memory during next four years in much more fields than “just” graphics. Just like GDDR3 ended up in all three consoles, network switches, cellphones and even cars and planes, GDDR5 brings a lot of new features that are bound to win more customers from different markets.
The reason for development of radical ideas inside GDDR5 lies in the fact that ATI was looking at future GPU architectures, and concluded that the DRAM industry has to take a radical step in design and offer interface more flexible than any other memory standard. Then, ATI experienced huge issues with R600 and its huge monolithic die. After a lot of internal struggle, engineering teams came to agreement that a change of course is necessary for generations to come: R700/RV770, R800/RV870, R900, R1K… all of these engineering designs are reshaped and refocused. Current and future goal is to design a compact and affordable transistor design that would not play a game of Russian roulette with yields coming from MAD AMD, TSMC’s and UMC’s foundries.
Development of this JEDEC certified standard happened under the lead of Joe Macri, Director of engineering at AMD and chairman of JEDEC’s Future DRAM Task Group JC42.3. Joe and his small ex-ATI/AMD GPGP team are mostly known for the development of the GDDR3 and GDDR4 memory standards, with former being probably the best thing ever to come out of the former ATI. ATI worked in solitude for a whole year before it sent initial specification to JEDEC in 2005. Then, Hynix, Qimonda and Samsung joined the effort to bring the new memory standard to life. When AMD acquired ATI in 2006, new management didn’t touch GDDR5 development and let the team to work in peace. Reason was simple: R&D team warned the management that GDDR5 development is much more difficult from work done on GDDR3 and GDDR4.
GDDR5 was seen as a path towards next-generation clients, that being consoles, desktop computing, networking equipment, HPC arena, handhelds… all of these roads start with one memory standard. At the time, engineers at ATI saw the path of success that GDDR3 took, and decided to create a spec that would outlive and outshine GDDR3.
In May 2008, AMD finally announced the launch of GDDR5 memory standard. Soon after, the company revealed its Radeon 4800 series and cards equipped with GDDR5 memory. Given the performance of Radeon 4870 512MB, 4870 1GB and 4870X2 2GB, it is obvious that the future of graphics (and not just!) belongs to GDDR5 memory.
At its very core, it is important to know that the main difference between LP-DDR (handhelds, PDAs), DDR (one fits all) and GDDR (Graphics) is the fact that capacity is not crucial, but performance is. Low-Power DDR and standard DDR are geared to enabling as much capacity as possible, while GDDR is usually referred to as the “Ferrari of the bunch”.
DDR, DDR2, DDR3, GDDR3, GDDR4, GDDR5 … got it?
If you can’t find your way through the jungle of different memory standards, don’t worry, you’re not alone. There is a lot of confusion in the world of DRAM memory, and sadly, there is no simple explanation. Most important thing to remember is that GDDR and DDR are not the same memory, and do not operate on same data sets.
As you can see, GDDR memory transfers 32-bit data, while conventional DRAM transfers 64-bit data chunks. Previous generations of graphics memory (GDDR2, GDDR3) were remotely based on the DDR2-SDRAM memory standard, while GDDR5 is heading into a new direction.
In fact, GDDR5 standard actually splits into two different ways how DRAM operates: Single-Ended and Differential. This is a revolutionary step for GDDR memory, since it was widely expected that Single-Ended memory is the only way to go. In a way, you can say that ATI developed GDDR5 and GDDR “5.5” or “6” at the same time. Single-ended support is compatible with existing memory standards such as DDR1/2/3/GDDR3/4 and represents evolutional path for DRAM. First products to market will use single-ended chips, but as soon as Hynix, Qimonda and Samsung start manufacturing differential modules (2009-10), a new era will begin.
Differential clock signaling is a method similar to interconnect buses such as HyperTransport, PCI Express, or Intel’s Quick Path Interface from Core i7. Differential introduces Reference clock, a clock that memory cell follows. Instead of using Ground wire as a passive driver, Differential mode enables precise communication and exactly this feature is the reason why available bandwidth is set for a dramatic change during lifetime of GDDR5.
The sheer bandwidth gain from one GDDR generation to another is impressive. GDDR3 peaked at 2.4 Gbps, GDDR4 concluded at 3.2 Gbps. GDDR5 chips split into two: Single-Ended will offer between 3.4 and 6.4 Gbps of bandwidth, while differential chips will yield between 5.6 and 12.8 Gbps.
Besides Differential mode, GDDR5 also introduces an Error Correction Protocol based on a progressive algorithm that actually enables more aggressive overclocking. Major changes in internal chip design also include Quarter-Data Rate clock, continuous WRITE clock, CDR based READ (no reading clock/strobe information), DRAM Interface training, Internal and External VREF and x16 mode.
One of very important things with GDDR5 is power reduction. If you take GDDR3 and GDDR5 modules, clocked at 1.0 GHz each, GDDR3 will have to operate at 2.0V, while GDDR5 needs only 1.5V. This results in 30% reduction of power consumption, while raising available per-pin bandwidth by almost 100%.
GDDR5 is designed to operate at low, medium and high frequencies. Low frequency (0.2-1.5 Gbps) calls for low-voltage (0.8-1.0V), while medium (1.0-3.0 Gbps) and high (2.5-5.0 Gbps) frequencies call for higher voltage, in a range between 1.4-1.6V.
High frequency is the only one that utilizes CDR (Command Data Rate) circuitry, while medium and low frequencies call for conventional mode (RDQS with Preamble).
Seeing the drop in power below levels of FB-DIMM DDR2-800 only makes us wonder what would happen if CPU manufacturers would implement Differential GDDR5 as system memory. Would we really need Gigabytes of system memory if we would have system memory with higher bandwidth than L2 and L3 cache? Intel is looking in similar direction, considers replacing SRAM cache with DRAM technology.
Sadly, the changes that would be required in memory controller are such that only place where GDDR5 will see the light of day as system memory are closed designs, such as consoles, set top boxes and so on. There is hope that some future AMD’s Fusion designs might implement GDDR support, but it is too early to tell.
How to lower the cost of manufacturing?
During design stages of GDDR5 memory, one of main concerns was how to simplify tracing on the PCB (Printed Circuit Board). On current GDDR3 and GDDR4 graphics boards, synchronization issues are solved by using traces of the same length from every pin on DRAM chip to the GPU. This causes quite a messy design, with traces going everywhere.
IF you’re PCB designer, there is one thing that you don’t want: complex routing of traces. This eventually leads to more PCB layers, higher cost and most importantly – more ways for *something* to go wrong. Every trace has increased isolation from electromagnetic interferences (EMI), while Asymmetrical Interface compensates for differences in length. In order to keep the signal integrity, several optimizations were made.
As you could see on picture above, GDDR5 PCB route is much cleaner than GDDR3, and you can see that if you compare Radeon 4850 to Radeon 4870, for instance. This was paid by additional resistors around memory chips, but second generation of GDDR5 graphics cards should feature cleaner design.
Memory designed for overclocking?
With power saving and performance-related tweaks, it is obvious that this memory is designed for overclocking. This was confirmed to us just by looking at slides from AMD and Qimonda.
The GDDR5 specification delivers a combination of three technologies: Adaptive Training and CDR, Error Detection and an on-die thermal sensor. Adaptive Training is combined with the Error Detection algorithm and enables the memory controller of the GPU to keep thermals on a tight leash. If you want to overclock the memory, it will go up until the error correction algorithm hits a thermal wall.
Error Detection works with both read and write instructions, offering real time repeat and resend operations. Thanks to asynchronous clocks, memory controller can control flow of data and resend bits of information that fail to arrive in time (or arrive corrupted). Error Detection algorithm will try to avoid a crash until the number of errors passes 1 Error/sec.
In order to maintain the signal stability, additional resistors were placed inside and outside the memory chip (take a look at the back of 4870 and compare it to 4850). AMD also addressed the issue spotted on GDDR4. Overclocking of GDDR4 memory was limited because DRAM timing loop would run out of power. GDDR5 changed the way how clock is generated and kept, so memory chip should never starve for power. No timing loop issue = no memory freeze. According to our sources, GDDR5 memory clocking in the end depends on the manufacturing process (used by the chip manufacturer) and the amount of voltage provided to the chip.
But main difference in clocking of GDDR3 and GDDR5 is the fact that PVT (Power, Voltage, and Temperature) is no longer the unbreakable barrier. Now, it is GPU’s memory controller that will keep (or fail to keep) the flow of data.
Coalition between the GPU and the RAM
Unlike previous memory standards, in order to extract the best possible performance memory controller has to support ALL of the GDDR5 features. This especially goes to Asymmetrical interface, since WRITE and READ clocks are programmed by the GPU. Advanced Clock Training calibrates GPU-RAM signals – without this feature, you cannot count on high clocks or overclocking capabilities. With four bits of data being sent per clock (instead of two), memory controller is exposed to a lot of stress, and has to be able to do error checking on the fly. Any misses on GPU side will lead to cycle losses – leading to instability.
Good example is memory controller tucked inside the Radeon 4800. This 256-bit controller supports DDR2, DDR3, GDDR3, GDDR4 and GDDR5 memory standards. The memory controller is tuned up to the point where bandwidth and clock limitation are on the side of the SGRAM chips: If the fastest GDDR5 memory chips were available today, you could build a 4800-series card with them. This also opens up revenue opportunities for Hynix, Samsung and Qimonda. All three manufacturers could earn a small fortune by selling gold sampled memory chips to premium graphics card manufacturers.
When it comes to Nvidia, answer to the question why the company went with GDDR3 for GTX 200 series of cards is not a simple one: according to our sources, GT200 chip supports GDDR3 and GDDR4, while engineers ran out of time to adjust memory controller to asymmetrical interface (advanced interface training), key feature for stable operation. But, if Nvidia sticks with 512-bit memory controller for NV70 generation (GT300?), we should see Nvidia GPUs featuring bandwidth in excess of 300 GB/s, more than twice that is available today. There is also a question what will Nvidia do with its two refreshes, 55nm GT206 and 40nm GT212 chips.
Intel is not giving out any details on Larrabee’s architecture, but we know for sure that the 1024-bit internal/512-bit external memory controller will support GDDR5 and its advanced features. Given the late 2009 release, support for differential mode should be a given. When it comes to christening, Larrabee with GDDR5 memory will debut during this winter, with first graphics cards delivered to Dreamworks.
Capacity – just how big can we go?
Now that you’ve seen all of the performance elements, time to write about capacity. While Joe told us that GDDR should be considered as “The Ferrari of DDR world”, GDDR5 introduces x16 mode. This mode has nothing to do with PCI Express x16 (to kill any potential confusion).
As you can see on the slide above, Clamshell mode is introduced to enable two memory chips sitting on a single x32 node. If we take ATI Radeon 4800 series, GPU features eight x32 I/O controllers. In theory, this should top at 16 memory chips per GPU, or 1GB of onboard memory using conventional 512Mbit chips. With x16 mode, card designer can put up to 32 chips (good luck with finding available space), or 2GB memory with 512Mbit (64MB) chips. With 1Gbit (128MB) chips, this number grows to 4GB. Qimonda is expected to ship 2Gbit (256MB) chips during 2009, enabling 8GB of on-board memory.
This number is increasingly important for GPGPU market, which wants as much on-board memory as possible. Bear in mind that Tesla 10-Series features 4GB of GDDR3 memory, and some contacts we’ve talked with – claim they would fill even more.
Eight GB of video memory may sound too much for consumer space, but if world is to usher into the era of Ray-tracing, we have to get enough space for gigabytes of data. Jules Urbach from JulesWorld explained that he is working with datasets bigger than 300 GB, and has to resort using AMD’s CAL (Compression Algorithm) to fit all the data inside 1GB per GPU (Jules uses R700 boards).
GDDR5 ramped up during 2008 and we expect the technology becoming a standard for GPU add-in-boards in 2009. ATI will migrate to GDDR5, so will Nvidia. With Intel joining the pack with Larrabee, volumes should be ready to drive the cost of GDDR5 into budget for next generation of game consoles, starting in the 2010-11 timeframe.
This is by far the most developed and well-thought memory standard that lacks childhood sicknesses like DDR2 and DDR3. GDDR5 is coming to market as a complete product, and offers solid future roadmap, with Differential GDDR5 even surpassing XDR2 DRAM in quest for highest possible per-pin bandwidth.
By that time, Differential GDDR5 should be cheaper than GDDR3 is today.