Processors

Thoughts On ''Bulldozer'', "Sandy Bridge"


I have already spilled some beans about the two upcoming architectures from the big players in the x86 game:
But it was not until just now, while doing some research, that I actually stumbled on some clarifications from Intel about AVX support in "Sandy Bridge". Well, it turns out that while Intel will have 256 bit FPUs per core but one big problem with first generation processors:

Hi Igor,
Sandy Bridge will not have FMA, it's targeted for a future processor. 
...
It sounds like you are an FMA supporter - beyond the raw FLOPS improvement, do you have any sensitivity to the numerical advantages FMA can provide? There are obviously a lot of tradeoffs in the implementations we can provide, and having some data to understand how you would use it would be very helpful.
Regards,
Mark Buxton

FMA, floating point multiply accumulate, has been available since SSE3, so this is a step back for the new instruction set. High performance applications like matrix multiplication, FFTs and dot product make extensive use of accumulators and the presence of FMA instructions increases the maximum performance. Besides, there's one big trick up AMD's sleeve: not just FMA support but "single cycle" FMA.
Intel will have two 256 bit AVX capable FPUs per core, AMD will have two 128 bits single cycle FMA capable FPUs. My understanding of this is that while peak performance will stay the same, "Bulldozer" will effectively double the peak 128 bit FP performance. When AVX capable software starts to roll out, AMD will remain competitive since it will also support the instruction set.

For reasons that Intel has not supported  FMA in "Sandy Bridge", relate to the fact that widening registers to 256 bits already doubles performance and increases the transistor budget.
AMD has some patents on a FMA FPU design and - while it suffered some drawbacks relating to die size and power consumption - it has mostly been able to mitigate those problems.

As a follow up, it is speculated that Intel's refresh to the "Sandy Bridge" architecture: "Haswell", will support FMA for AVX operations (ed: this became AVX2).

It seems AMD will have a big leg up in web servers and desktop performance, while it should definitely come close when it comes the worlds fastest computers, assuming will see competition amount:
  • Bulldozer w/ 8 modules from AMD (16 integer cores, 8 128 bit FPUs)
  • Up to 8 cores from Intel designs.
This early in the game, it is hard to predict which architecture will come out on top. It will depend on other factors not currently known like core/module count, clock speed and other architectural improvements.

2 comments:

Anonymous said...

http://en.wikipedia.org/wiki/Haswell_(microarchitecture)

Intel Haswell(Tock replacement to Sandy Bridge) will have FMA.

Tiago Marques said...

Thank you. I actually knew that but didn't explicitely address that since it is mentioned when Intel's Mark Buxton refers it will be available in a "future processor".

Best regards

Post a Comment