High-End x86: The Nehalem EX Xeon 7500 and Dell R810
by Johan De Gelas on April 12, 2010 6:00 PM EST- Posted in
- IT Computing
- Intel
- Nehalem EX
SAP S&D 2-Tier
SAP S&D 2-Tier | |
Operating System | Windows 2008 Enterprise Edition |
Software | SAP ERP 6.0 Enhancement package 4 |
Benchmark software | Industry Standard benchmark version 2009 |
Typical error margin | Very low |
The SAP SD (Sales and Distribution, 2-tier internet configuration) benchmark is an interesting benchmark as it is a real world client-server application. We decided to look at SAP's benchmark database. The results below all run on Windows 2003 Enterprise Edition and MS SQL Server 2005 database (both 64-bit). Every "2-tier Sales & Distribution" benchmark was performed with SAP's latest ERP 6 enhancement package 4. These results are not comparable with any benchmark performed before 2009. The new "2009" version of the benchmark obtains scores that are 25% lower. We analyzed the SAP Benchmark in-depth in one of our previous server oriented article. The profile of the benchmark has remained the same:
- Very parallel resulting in excellent scaling
- Low to medium IPC, mostly due to "branchy" code
- Somewhat limited by memory bandwidth
- Likes large caches (memory latency!)
- Very sensitive to sync ("cache coherency") latency
Since we gather the benchmark data from the SAP site, we have to work with what we found so far. A quad Xeon X7560 outperforms an octal-core Opteron 8435 at 2.6GHz by small margin (3%). A quad Opteron 6176 at 2.3GHz should score about 48k-50k. That is competitive performance, but this market will probably prefer the Xeon platform, as price is less an issue and reliability features are on top of the checklist. The Power 7 servers outperform the Nehalem EX CPUs, but the top models (3.55GHz) cost around $100k.
23 Comments
View All Comments
dastruch - Monday, April 12, 2010 - link
Thanks AnandTech! I've been waiting for an year for this very moment and if only those 25nm Lyndonville SSDs were here too.. :)thunng8 - Monday, April 12, 2010 - link
For reference, IBM just released their octal chip Power7 3.8Ghz result for the SAP 2 tier benchmark. The result is 202180 saps for approx 2.32x faster than the Octal chipNehalem-EXJammrock - Monday, April 12, 2010 - link
The article cover on the front page mentions 1 TB maximum on the R810 and then 512 GB on page one. The R910 is the 1TB version, the R810 is "only" 512GB. You can also do a single processor in the R810. Though why you would drop the cash on an R810 and a single proc I don't know.vol7ron - Tuesday, April 13, 2010 - link
I wish I could afford something like this!I'm also curious how good it would be at gaming :) I know in many cases these server setups under-perform high end gaming machines, but I'd settle :) Still, something like this would be nice for my side business.
whatever1951 - Tuesday, April 13, 2010 - link
None of the Nehalem-EX numbers are accurate, because Nehalem-EX kernel optimization isn't in Windows 2008 Enterprise. There are only 3 commercial OSes right now that have Nehalem-EX optimization: Windows Server R2 with SQL Server 2008 R2, RHEL 5.5, SLES 11, and soon to be released CentOS 5.5 based on RHEL 5.5. Windows 2008 R1 has trouble scaling to 64 threads, and SQL Server 2008 R1 absolutely hates Nehalem-EX. You are cutting Nehalem-EX benchmarks short by 20% or so by using Windows 2008 R1.The problem isn't as severe for Magny cours, because the OS sees 4 or 8 sockets of 6 cores each via the enumerator, thus treats it with the same optimization as an 8 socket 8400 series CPU.
So, please rerun all the benchmarks.
JohanAnandtech - Tuesday, April 13, 2010 - link
It is a small mistake in our table. We have been using R2 for months now. We do use Windows 2008 R2 Enterprise.whatever1951 - Tuesday, April 13, 2010 - link
Ok. Change the table to reflect Windows Server 2008 R2 and SQL Server 2008 R2 information please.Any explanation for such poor memory bandwidth? Damn, those SMBs must really slow things down or there must be a software error.
whatever1951 - Tuesday, April 13, 2010 - link
It is hard to imagine 4 channels of DDR3-1066 to be 1/3 slower than even the westmere-eps. Can you remove half of the memory dimms to make sure that it isn't Dell's flex memory technology that's slowing things down intentionally to push sales toward R910?whatever1951 - Tuesday, April 13, 2010 - link
As far as I know, when you only populate two sockets on the R810, the Dell R810 flex memory technology routes the 16 dimms that used to be connected to the 2 empty sockets over to the 2 center CPUs, there could be significant memory bandwidth penalties induced by that.whatever1951 - Tuesday, April 13, 2010 - link
"This should add a little bit of latency, but more importantly it means that in a four-CPU configuration, the R810 uses only one memory controller per CPU. The same is true for the M910, the blade server version. The result is that the quad-CPU configuration has only half the bandwidth of a server like the Dell R910 which gives each CPU two memory controllers."Sorry, should have read a little slower. Damn, Dell cut half the memory channels from the R810!!!! That's a retarded design, no wonder the memory bandwidth is so low!!!!!