P大，INTEL的IA架构手册已经到手了，还有些问题啊——

zf1666 · zf1666

原帖由 hopetoknow2 于 2006-6-13 21:42 发表

是呀，意思不太大，不过如果你仔细看和分析，Yonah的一些特性还是很有意思的。

例如Yonah的L1互通，这可以推算Core的情况
L2延迟是14cycles

Yonah双核在load数据的顺序是：write buffer、自己的L1；若 ...

这个上面，Core和Yonah不一样。

5.5个bus cycle是个很长的时间。

wuhuataocn · wuhuataocn

原帖由 Prescott 于 2006-6-13 21:45 发表

这个上面，Core和Yonah不一样。

5.5个bus cycle是个很长的时间。

我弄错了，那是Yonah访问内存的计算方法。大约才86cycles，真低啊。

sdlfll2 · sdlfll2

4B就是排号啊？现在只有1，2，3A，3B和4A——

mujingling3 · mujingling3

在我的aopen 975x测试中，yonah @ 2.600ghz的cache交换时间是13x ns per pin-pong。
而在我的conroe 2.67ghz测试中，cache交换时间77ns per pin-pong。

liang19821127 · liang19821127

原帖由 Edison 于 2006-6-13 21:53 发表
在我的aopen 975x测试中，yonah @ 2.600ghz的cache交换时间是13x ns。
而在我的conroe 2.67ghz测试中，cache交换时间77ns。

77ns，远高于实际值哦。你怎么测试的？

80881

也许是测试程序的问题，修改后可以缩小到1/4，不过conroe现在归还了。

lishaowei · lishaowei

原帖由 Edison 于 2006-6-13 22:01 发表
也许是测试程序的问题，修改后可以缩小到1/4，不过conroe现在归还了。

Yonah也可以缩小1/4吗?

honets · honets

原帖由 Edison 于 2006-6-13 22:01 发表
也许是测试程序的问题，修改后可以缩小到1/4，不过conroe现在归还了。

我觉得Cho你的测试yonah架构图画错了。DP FMUL和DP FADD不应该画到同一个单元中，不该都在Port0

On Intel Core Solo and Intel Core Duo processors, the combination of
improved decoding and micro-op fusion allows instructions which were
formerly two, three, and four micro-ops to go through all decoders. As a
result, scalar SSE/SSE2 code can match the performance of x87 code
executing through two floating-point units. On Pentium M processors,
scalar SSE/SSE2 code can experience approximately 30% performance
degradation relative to x87 code executing through two floating-point
units.
In code sequences that have conversions from floating-point to integer,
divide single-precision instructions, or any precision change; x87 code
generation from a compiler typically writes data to memory in
single-precision and reads it again in order to reduce precision. Using
SSE/SSE2 scalar code instead of x87 code can generate a large
performance benefit using Intel NetBurst microarchitecture and a
modest benefit on Intel Core Solo and Intel Core Duo processors.

av6421165 · av6421165

同时我认为scalar SSE2乘法指令MULSD 和x87的DP fmul指令都是共享使用同一个DP浮点乘法器。
而scalar SSE2加法指令ADDSD 和x87的DP fadd指令都是共享使用同一个DP浮点加法器。

当然这也意味着并行SSE2乘法指令MULPD是需要2次使用这一个DP浮点乘法器，而并行SSE2加法指令ADDPD是需要2次使用DP浮点加法器

cici0325 · cici0325

FMAD/FADD是指x87的，不是DP，而是Long Double。

图中已经把SIMD FP ADD/SIMD DP MUL分别放在不同的port。因为是直接沿用PIII的架构图修改了一下，所以xxxPD没有写上去，这些指令的位置和对应的XXXPS单元位置一样的。

		自动登录	找回密码
密码			立即注册

P大，INTEL的IA架构手册已经到手了，还有些问题啊——

浏览过的版块