原帖由 Edison 于 2006-6-13 22:01 发表
也许是测试程序的问题,修改后可以缩小到1/4,不过conroe现在归还了。 我觉得Cho你的测试yonah架构图画错了。DP FMUL和DP FADD不应该画到同一个单元中,不该都在Port0

On Intel Core Solo and Intel Core Duo processors, the combination of
improved decoding and micro-op fusion allows instructions which were
formerly two, three, and four micro-ops to go through all decoders. As a
result, scalar SSE/SSE2 code can match the performance of x87 code
executing through two floating-point units. On Pentium M processors,
scalar SSE/SSE2 code can experience approximately 30% performance
degradation relative to x87 code executing through two floating-point
units.
In code sequences that have conversions from floating-point to integer,
divide single-precision instructions, or any precision change; x87 code
generation from a compiler typically writes data to memory in
single-precision and reads it again in order to reduce precision. Using
SSE/SSE2 scalar code instead of x87 code can generate a large
performance benefit using Intel NetBurst microarchitecture and a
modest benefit on Intel Core Solo and Intel Core Duo processors. |