Some people around the web have been posting some FUD about the existence of a TLB bug in Intel's new Nehalem, Core i7, processor, most notably, FudZilla. AFAIK, this matter is being blown way out of proportion.
Fudzilla's Fuad Abazovic refers to AAJ1, in the chapter of specification clarifications, which states:
In rare instances, improper TLB invalidation may result in unpredictable system
behavior, such as system hangs or incorrect data. Developers of operating systems
should take this documentation into account when designing TLB invalidation
algorithms. For the processors affected, Intel has provided a recommended update to
system and BIOS vendors to incorporate into their BIOS to resolve this issue.
So, the developers of operating systems and BIOS, should take the updated documentation into account, if so, nothing goes wrong. This documentation referred is the Intel® 64 and IA-32 Architectures Software Developer's Manual,Volume 3A: System Programming Guide, which will be updated soon, if it hasn't already.
As for the TLB invalidation process, it should occur if:
As noted in Section 3 and Section 4, the processor may create entries in the TLBs and the paging-structure caches when linear addresses are being translated and may retain these entries even after the paging structures used to create them have been modified. To ensure that address translation uses the modified paging structures, software should take action to invalidate any cached entries that may contain information that has since been modified.
More information is available in this Intel document. To put it bluntly, "Nehalem" has a slightly modified paging structure, which I won't go into much detail about, that needs to be taken into account when software needs to perform a TLB or Paging-Structure cache invalidation, a fairly common procedure. A TLB invalidation may occur if software, like the operating systems, performs a context switch, or a task change, which replaces the process virtual address space with another, entries "cached" inside the TLB will then be defunct and subject to removal.
If, and only if, the invalidation algorithm doesn't take "Nehalem" architectural changes into account, will then, "In rare instances", the system suffer "unpredictable behavior" or a crash. Hence, this is not a bug. Don't worry your Core i7 will be safe.
On another note, a real TLB bug does exist, errata AAJ42, which you can see states:
Incorrect TLB Translation May Occur After Exit From C6
Under certain conditions when C6 and two logical processors on the same core are
enabled on a processor, an instruction fetch occurring after a logical processor exits
from C6 may incorrectly use the translation lookaside buffer (TLB) address mapping
belonging to the other logical processor in the processor core.
This isn't of much a trouble, since it only occurs when the processor has entered the C6 sleep state, something very uncommon in desktops and even in laptops, when most don't go further than the C3 or C4 state, some staying as high as C2(for wakeup delay reasons). The C6 state was introduced with Intel's "Penryn" core and, as far as I'm aware, it's not a requirement for Suspend-To-Ram, or S3, so it goes pretty much unused in everyday life.
Neither of these two problems pose any danger, not performance or feature wise, like the TLB bug in AMD's Barcelona(Phenom) B2 steeping posed. I leave the rest of this paragraph open for discussion but, if I'm not mistaken, the fix for this bug required disabling at least one level of the TLB, probably related to the L3 cache, hence bringing some very significant performance drops. This is nowhere near close of what either of these two "bugs" pose.
Intel's Core i7 is fast, fast, very fast and expensive. If you have the funds go ahead, it's one hell of a CPU, let no FUD about one or two bugs which are completely avoidable at none or little cost scare you from having the greatest desktop CPU currently available. This isn't a Barcelona B2 scenario.
7 comments:
so... "Tiago Marques"? after reading your article i was like, "wow, another intel fanboy thinking he knows something of the market"
just a quote of your text:
"In rare instances, improper TLB invalidation may result in unpredictable system"
erm. that is just like the TLB errata of Barcelona/Agena B2. but i haven't ever heard of somebody who really was able to reconstruct the situation in which this errata takes affect. i'm also using a Agena B2 chip since January 2008, but didn't ever get a system hang out of the TLB bug.
so, why does the AMD bug matter, but the intel bug does not?
one other thing was the statment "Core i7 is the greatest desktop CPU currently available"
In my opinion, it's performance is not that much higher than penryn. but the price difference is still higher. I think, that the Nehalem arcitecture has very much potential, but at this moment it's just a small performance gain for a much higher price.
So, if you try to write another article about CPU related topics, please, try to get some infos about the market, and stop flaming companies because you like intel better. Maybe you'll grow up sometime...
So, "RìggníX", you are aware that the segment you're quoting from the post is citing Intel's official documentation, right?
Pretty quick on passing judgement, there. Take your own advice.
Hi Rìggíx,
"In rare instances, improper TLB invalidation may result in unpredictable system"
The key word here is TLB invalidation, in this case algorith. An algorithm is a way for solving a problem, in this case of the invalidation of the whole TLB or entries from it.
Only an Intel engineer can provide further details on this but from what Intel states, this is completely avoidable without a performance penalty - the person designing the algorithm, maybe for an OS, must just take into account the changes done to the architecture. This happens all the time and stays far away from the sight of end-users.
Theo de Raadt, the lead OpenBSD developer, had stated that the Core 2 was a CPU full of bugs, which had to be fixed in software and he was very disturbed about that. Still, you probably never heard of that, as should be the case now. They are not show stopping bugs.
The AMD TLB bug was another story, the details are a bit scarce but AMD seems to have disabled the ability of the TLB to look for page table entrys in the cache, hence increasing latency for memory accesses, which is a performance killer. Some apps lost more than 30% performance.
This is coherent however, since if you look at thetechreport's benchmark of the fix, you will see the memory performance has indeed dropped.
The fix for the B3 revision also means a little performance is lost but the TLB can still access the cache, it just evicts the L2 page table entries to the L3 cache when there could be a problem.
There was also a rumour that this only
happened in CPUs clocked at 2.4GHz, which is in line with what AMD states in the erratum, of a small time window. Since the problem involves both the L2 and L3 cache, which have different clocks, it's possible that higher core clock shortens this time window, thereby exposing the problem.
If you're running at a clock lower than 2.4GHz, you probably won't ever experience the bug. You also, probably, don't run all cores at 100% for a long time, and this is a rather sporadic ocurrence.
You are right when you say that Penryn's performance is not that much different but you still aren't seeing highly multithreaded apps around to play with. When you do, in case of video encoding and such, things are way brighter for Nehalem.
If you're just a gamer I would say a E8x00 cpu is the perfect buy for now, but if Intel built a CPU like Penryn in the next 2-3 years, you wouldn't want it if you also had Nehalem. Just wait :)
Like the Phenom, the Core i7 is a very forward thinking CPU. I do some research in HPC and computer architecture and believe me when I say that the Phenom is a dream for this market, as is the new i7, or it's Xeon counterpart, when it comes. They both blow Penryn out of the water.
What you must understand is that, in highly threaded applications, the Core i7 mitigates all the problems Penryn had, mainly the shared cache and FSB. The shared cache was nice for single threaded apps but is no good for multithreading, you will also see why the L3 is important when AMD launches the L3 cacheless Phenoms - again, especially in multithreaded apps, not the dual-threaded stuff we see today.
Since Intel has a better performance per core than the lattest AMD CPUs, the i7 is now a far superior CPU to what AMD has to offer - albeit also more expensive.
I don't like Intel more, I like AMD more. I have 4 pc's, all AMD, and an Intel laptop which I intend to exchange for a Turion Ultra when I can.
I had a Core 2 Duo and Intel's platform is no good. It gave me too much problems to do anything other than running stock clocks, and even so...
My current recommend system involves AMD's Phenom X3 or X4 processors and none Intel, mostly due to platform architecture, cool and quiet and other stuff, like cost/performance ratio.
I could go on and on, all day, why I prefer AMD's offers right now, like I do since the K7 came out, but that would be quite boring :)
After all this, I still must say that the i7 isn't the most cost effective CPU for the desktop, not my choice of CPU, but it's the top performer CPU and there's no denying that, even if I don't like it that way.
at first i have to apologize for my first post, it ended up much more critical than i was up to. it just looked like a typical fanboy post for the first view, but after i read your comment i see that you know what you're talking about.
thank you for your detailed answer. i know about the theories of the AMD TLB bug. i also clocked my Phenom 9600BE to 2.6 GHz for quite some time and didn't get a problem. but i may just have been lucky.
I also saw lots of reviews trying to get a system hang out of the bug, but there wasn't one "successful" try i've seen, if there were some, please tell me.
So, i understand the difference between this bug and the intel bug. The Intel bug is not really that "bad". But the thing i was up to is saying the AMD bug wasn't that much of a problem too. So I'm sorry of you got me wrong.
About "Core i7 is the greatest desktop CPU currently available":
To get to a point here we need to define "greatest desktop CPU currently available". In things like per core performance, nehalem leads the market, thats fact.
But IMO to be the greatest CPU available, there are more factors to look at. First of all, the "bang for the bucks" (price/performance) is pretty important to me. I'm sure Nehalem will get better in this term, but the "currently available" Nehalem is pretty expensive. Another thing is the improvements to older architectures. Core 2 was one of the greatest architectures intel ever had when it comes to performance per core/clock. the improvements to Core and P4 where pretty big. as for now, nehalem didn't show very much benefits. one of the mayor problems in this area certainly is software. Nehalem IS designed for multithreaded software. But at this moment that generally is not a very big pro for the costumer. It will be, but it's not now. So i think there will be really "great desktop CPUs" out of the nehalem architecture.
And i have something to add to the AMD TLB bug. I know about the preformance decreases if the patch was applied. My point is, that the patch isn't really necessary, so the performance decrease isn't present.
When I mean "greatest" I do mean the usual "latest and greatest" processor available, as in "state of the art", which it is.
If you look at the proper applications, you will see that the leap from Core 2 to Nehalem is about the same that Intel had from the P4 to Core2, although I'm saying without looking at benchmarks again - so I stand to be corrected if not.
However, I do agree with you and I wouldn't define it as a great desktop CPU because it isn't. There really are better alternatives.
From what Intel roadmaps state, you will have the great desktop CPUs derived from Nehalem, in the form of Lynnfield, without QPI and with embedded PCI-Express controllers, that will be something to really watch out if you're a gamer. The lower latencies are sure to bring more performance to the table. With that integrated in the CPU, you don't really need QPI. To use integrated graphics QPI would be nice but Intel also has a Nehalem variant with an integrated core, so what will happen is still unclear.
About the TLB bug being exposed... that's a bit more hard. I would try something that puts a big burden on all cores(that would be linpack) but the coherency thing puts me off a bit. I would say probably Linpack with a matrix size that fits the L3 cache, run over and over again. Although I'm not sure about which algorithm is used, so I can't really say if it would stress the cache coherence mechanisms of the CPU.
The problem is that this problem, even rare, is bound to happen and in critical applications you can't have your system crashing randomly for no reason, hence the concern from AMD's engineers.
Also, some conspiracy theories state that the 2.4GHz figure and the X3 come from problems like one core not clocking right and not a really big problem with the TLB. I believe it's a mixture of both.
Read more about the 3rd core stuff here:
http://www.tomshardware.co.uk/forum/248265-10-phenom-exposed-shipping-flaky-cores
Still, the problem with the TLB fix is that some motherboards don't allow you to disable it. While it's not an issue to you, I personally wouldn't buy one because of those possible constraints. AOD isn't a proper fix for that one either, If I can't have it on the BIOS I say screw it.
Fortunately there are now tons of B3 cores, so that's again a non-issue.
If my PC crashed because the fix wasn't applied, it would happen so rarely that it wouldn't matter but if it happened in a critical server in a company, for which I am responsible, you can bet I would just take it out of there, if I couldn't live with the performance loss.
Intel had a problem with some .180nm P3 1.13GHz processors, if I'm not mistaken, and they pulled them all. One month you were reading reviews about them, where NO ONE noticed problems whatsoever, the next they were pulling them and granting AMD the lead.
Things like these affect the whole market, even if they only pop up in 0.5% of the market. Same reason why B2 weren't pulled from HPC clusters, they surely now that the problem won't affect them from talking to AMD.
Best regards,
Tiago Marques
i agree with you in terms of QuickPath. certainly a good step forward, even if such a interface was used by amd for years, cause the clocks QPI is achieving at the moment are pretty impressive for the first generation.
Integrating the PCI Express controller haven't heard of that yet, pretty interesting, but i guess it's a step to get a CPU/GPU combination like AMD Fusion.
The thing you said about the TLB bug in big server clusters makes sense. and i also have to add, i have the problem with the "defective" third core, but after raising the voltages of the cores and the northbridge it looks better.
I hope I can get myself a Phenom II, cause i'm very confident when it comes to overclocking on air. And if the prices will be like some popping up in the web lately, it's gonna be a "great desktop CPU" for costumers ;-)
It's nice that you were able to fix that 3rd core problem. Recently it also seems to have gone away on B3 steppings.
Indeed, the Phenom II seems to be a good proposal from AMD, to which Intel will answer with 65W Quads. Since Intel is pricing the Core i7 in the very high side, plus the $250 motherboard and some extra for DDR3, AMD might be able to grab again a good foothold on the desktop market.
Post a Comment