AMD PRs Counter Paper Launch With Slides

Marketing slides, from AMD to Nvidia, with too little love in between. These are excerpts, the most interesting ones, so let's start by the end. It's the most important slide, left to be completely disregarded in a stack of more than twenty. It shouldn't be, since it sheds much light to what follows below. Remember: "may contain technical innacuracies".

Yup, all valid points. That Fermi was built for HPC, you already knew. You overshot on the die size yourselves but Nvidia is in worst shape.

What??? AMD's PRs don't have engineers they can talk to? They don't now how much bandwidth Fermi will have? Here's a hint: 4.8GHz GDDR5 on a 384bit bus equals 230GB/s.
2.72 TFLOPs/s? Yeah right, like if the card wouldn't blow up with such efficient codes... Where's the FireStream card that can really do that?
I already mentioned that AMD was marketing this card like Intel with the Pentium 4. The worse is we must gobble it up because it's still the fastest card around.

Predicting just 1.5TFLOPs/s for Fermi is harsh. The GT200 can manage 900+GFLOPs/s, the cores have more than doubled and the FMA instruction helps out in some situations. It may turn out to be also more than 2TFLOPs/s but, as with the AMD card, it will hardly be achievable in real world scenarios(although not by PWM capping reasons!).

The Radeon 5870 doesn't have 1600 shader cores, get over it, that's PR talk. It has 320 cores, where each is a 5 way vector unit. It's not the same thing. Neither are Nvidia's cores, which are groups of 32 cores performing the same instruction on different data offsets. It's not ideal, but it's still more efficient than vector units. We also don't see the 6.67x difference in "shader cores" versus the GeForce GTX 285 translated into performance gains that big and we won't when Fermi debuts. Bandwidth also plays a role, there's too much wasted transistors into too many shader cores per bandwidth.

The global data share seems a requirement for AMD's architecture due to the split nature of the L2 caches, which are tied to the memory controller. Nvidia has one of 768KiB, AMD has four of 128KiB each(old slide):

Coherency issues? Apparently, the AMD only Global data share solves that. I wouldn't say that's a victory. They're making too much of a fuss about something that, rightfully so, is not targeted at compute. The Radeon 5000 is targeted at gaming. End of story. If they don't care for the market, it would be wise to stop spreading FUD.

Nvidia shows "L2 cache (per SM)", which is an error, it's global, unique. The L1 is interchangeable with the shared memory in size, which is great.

All valid points. AMD can't touch Fermi for HPC applications, no problem there.

AMD must stop skewing the compute power. They are the first to market with a good card - it's not a great card but it's not terrible either. It's not up to HD 4800 standards but they still have a full three months to fix the lack of bandwidth before Nvidia releases anything Fermi related. By the time they do, they yields should be pretty good on the 5800 series and the dual GPU 5800 is shaping up to rule the high-end. And the high-end replacement should be around. Hardly will it have an increase bus width but GDDR5 can reach around 7GHz - if the memory controller allows it - which translates to 224GB/s.
Desperate moves for a company that's having trouble supplying enough 5800 cards to the customers.

Source: PCGH

The Bit Speek

Graphics Cards

AMD PRs Counter Paper Launch With Slides

No comments:

Post a Comment

Popular Posts

Sponsors

Categories