AMD and the APU, 785G performance shortcomings

Intel is moving the GPU next to the CPU, starting with the 32nm "Westmere" architecture based "Clarkdale" processors, which will span, at least, Core i3 and Core i5 models by the end of the year.

As I mentioned on the article about Nvidia's missing chipsets for the LGA 1156 platform, Intel needs "Clarkdale" desperatly if it wants to have "Nehalem" based CPUs with integrated graphics available in laptops. There simply isn't an easy and cheap way to fit an integrated graphics core on the chipset anymore, now called PCH(Platform controller HUB).

AMD is also moving in the same direction, probably more by need than to just claim be the first to integrate both the CPU and GPU, or even to come up with something truly innovative - currently, AMD is calling it's integrated CPU/GPU the APU(Advanced processing unit).
The need for AMD to integrate is similar to the motives that also drive Intel: the integrated memory controller sits on the CPU now and the ever increasing bandwidth that it provides needs to be fed to a far place, the chipset. AMD relies on HyperTransport to do this, which has been enough up to now but is starting to become a bottleneck to the integrated graphics core on the chipset.

Theoretically, HT 3.0 is able to provide 10.4GB/s when running at 2.6GHz(5.2 Giga Transfers/s) on the 16 bit HT that AMD currently uses. AM2, AM2+ and AM3 platforms have been engineered to deliver up to that HT clock and no more. That is still enough to serve as a processor interconnect but AMD still hasn't scaled that high: it's still at only 2GHz(4GT/s) for the higher end Phenom II X4 processor.

2GHz @ 16 bits is enough to provide 8GB/s, enough bandwidth to keep dual PCI-e 2.0 x16 slots fed but not much else. Remember that the current push for SATA 6Gb/s will push another 600MB/s per SATA channel to the CPU - over the HT interconnect - and then bottlenecks will become more apparent than they are today. While that wasn't an issue while HDDs were around, SSDs have been pushing the envelope and are already saturating SATA 3Gb/s ports.
AMD could use 32 bit wide HT links but those would increase the costs of AMD motherboards, which is definitely an advantage to go AMD rather than Intel right now.
Another issue is PCI-e 3.0, which will push 8GB/s per x16 slot, a bandwidth target that not even a 5.2GT/s link will be able to sustain if you need to transfer information to both cards at the same time. This is not a big issue for games but is much more apparent when using GPUs to offload computation and it frequently becomes a bottleneck when doing synchronization to more than one GPU at the same time. My experience coding CUDA for the last 6 months has brought this to my attention more than what I'd previously thought and there is a dire need for the move to PCI-e 3.0 when doing GPGPU.

When you think of the platform like this, it becomes apparent why AMD might have refrained itself from upgrading the graphics core in the new RS800 series chipsets, comercially known as AMD 785G - while the shader model support is now compliant with DX10.1 requirements, the performance hasn't increased much as the GPU still retains the same 40 shaders that the 780G did. AMD might have chosen to stick to the rather old 40 shader core due to lack of available bandwidth to feed the GPU rather than for die space and cost constraints.
Just look at bandwidth: AMD's integrated controller can deliver around 9GB/s in AM3 form. Right now, you can't even feed that much to the integrated core if you'd like to(except with OC, of course) - the current AM3 HT 3.0 implementation tops out at 8GB/s.
AMD needs to move the GPU closer to the core soon, where high-bandwidth interconnects are cheaper to build. Before AMD can do that, it will be very hard to see another iGPU push the envolope like the 780G did when it was released, there simply isn't enough bandwidth. You can simulate this yourself, just mess around with the HT bus speed and see the effect it has on performance of the integrated core. Hopefully, I'll be able to provide some benchmarks of that since I've already found some unexpected situations(XBMC, for instance), where the integrated GPU can be starved for bandwidth doing simple things like browsing through the interface.

In the end, while AMD can still try and push for a higher bandwidth with HT 3.1, which could deliver 25.6GB/s, running 6.4GT/s @ 32bits, it would still have to suffer from more expensive motherboards and a whole new platform. At this point, why not just move a GPU closer to the core, in an Multi Chip Module(MCM) solution, just like what Intel is doing with "Clarkdale"? The MCM solution isn't planned but the APU is. With either route, performance is expected to go up considerably and Intel is looking like it will be the first to capitalize on the early move, courtesy of the MCM solution.

The Bit Speek

Graphics Cards

AMD and the APU, 785G performance shortcomings

No comments:

Post a Comment

Popular Posts

Sponsors

Categories