P大，INTEL的IA架构手册已经到手了，还有些问题啊——

zhoujianpo · zhoujianpo

P大，INTEL的IA架构手册已经到手了，还有些问题啊——
不知道4B什么时候有？4B是不是会讲Yonah和Conroe？

hctgy · hctgy

哪有什么4B？4A是什么？

Conroe的优化手册还是Confidential

ljlwxb · ljlwxb

牛奶，你好。我想说， Intel的手册内容很多很多，会看死人的。而优化手册，它对写汇编的人很有用。 Core的优化手册很值得阅读和分析。

我根据以往的阅读经历，觉得这些手册一般并不介绍一些“特别”令人感兴趣的底层信息。不足以了解比较细一些的CPU内部结构。 AMD的手册也差不太多的味道。

shoujiyingjian · shoujiyingjian

另外Yonah似乎已经开始在书里面出现了，有没有专门讲它，记不清了，你可以去下下最新的pdf看。
core duo

tq064yw · tq064yw

_ _

haohao766124 · haohao766124

原帖由 hopetoknow2 于 2006-6-13 21:12 发表
_ _

最新优化手册：另外Yonah似乎已经开始在书里面出现了，有没有专门讲它，记不清了，你可以去下下最新的pdf看。core duo
http://download.intel.com/design/Pentium4/manuals/24896613.pdf

http://download.intel.com/design/Pentium4/manuals/25366519.pdf

反正也不会太多太细

Intel Core Solo and Intel Core Duo processors incorporate an
microarchitecture that is similar to the Pentium M processor
microarchitecture, but provides additional enhancements for
performance and power efficiency. Enhancements include:
This second level cache is shared between two cores in an Intel Core
Duo processor to minimize bus traffic between two cores accessing
a single-copy of cached data. It allows an Intel Core Solo processor
(or when one of the two cores in an Intel Core Duo processor is idle)
• Stream SIMD Extensions 3
These extensions are supported in Intel Core Solo and Intel Core
Improvement in decoder and micro-op fusion allows the front end to
see most instructions as single μop instructions. This increases the
throughput of the three decoders in the front end.
Throughput of SIMD instructions is improved and the out-of-order
engine is more robust in handling sequences of frequently-used
instructions. Enhanced internal buffering and prefetch mechanisms
also improve data bandwidth for execution.

Execution of SIMD instructions on Intel Core Solo and Intel Core Duo
processors are improved over Pentium M processors by the following
enhancements:
• Micro-op fusion
Scalar SIMD operations on register and memory have single
micro-op flows comparable to X87 flows. Many packed instructions
are fused to reduce its micro-op flow from four to two micro-ops.
• Eliminating decoder restrictions
Intel Core Solo and Intel Core Duo processors improve decoder
throughput with micro-fusion and macro-fusion, so that many more
SSE and SSE2 instructions can be decoded without restriction. On
Pentium M processors, many single micro-op SSE and SSE2
instructions must be decoded by the main decoder.
• Improved packed SIMD instruction decoding
On Intel Core Solo and Intel Core Duo processors, decoding of most
packed SSE instructions is done by all three decoders. As a result
the front end can process up to three packed SSE instructions every
cycle. There are some exceptions to the above; some
shuffle/unpack/shift operations are not fused and require the main
decoder.
Data Prefetching
Intel Core Solo and Intel Core Duo processors provide hardware
mechanisms to prefetch data from memory to the second-level cache.
There are two techniques: one mechanism activates after the data access
pattern experiences two cache-reference misses within a trigger-distance
threshold (see Table 1-2). This mechanism is similar to that of the
Pentium M processor, but can track 16 forward data streams and 4
backward streams. The second mechanism fetches an adjacent cache
line of data after experiencing a cache miss. This effectively simulates
the prefetching capabilities of 128-byte sectors (similar to the sectoring
of two adjacent 64-byte cache lines available in Pentium 4 processors).
Hardware prefetch requests are queued up in the bus system at lower
priority than normal cache-miss requests. If bus queue is in high
demand, hardware prefetch requests may be ignored or cancelled to
service bus traffic required by demand cache-misses and other bus
transactions.
Hardware prefetch mechanisms are enhanced over that of Pentium M
processor by:
• Data stores that are not in the second-level cache generate read for
ownership requests. These requests are treated as loads and can
trigger a prefetch stream.
• Software prefetch instructions are treated as loads, they can also
trigger a prefetch stream.
......

shaneshane · shaneshane

都是旧资料了亚。

fikiaqn · fikiaqn

INTEL的IA32架构手册我有，不过那个是给程序员看的啊，我用来查过一些指令集的用法，不过没看到里面有什么“CPU FANS"们关心的东西啊？

zcz1234 · zcz1234

原帖由 Edison 于 2006-6-13 21:27 发表
都是旧资料了亚。

是呀，意思不太大，不过如果你仔细看和分析，Yonah的一些特性还是很有意思的。

例如Yonah的L1互通，这可以推算Core的情况
L2延迟是14cycles

Yonah双核在load数据的顺序是：write buffer、自己的L1；若没找到，然后就去L2、另一核的L1中找，最后是去内存找数据。
从另一核的L1中load到数据的典型延迟是14cycle+5.5*总线周期，手册是说是这个L1连接的总线周期是同主频的－－>我推算为20个cycles。

wkp5883135 · wkp5883135

ia32手册资料少？这取决于你怎么看。

不过如果想快速了解体系架构，应该看优化手册，只是pentium-m后intel提供的资料就非常模糊了，反正有些细节对程序员来说这都是无所谓的，例如ROB、RS。

		自动登录	找回密码
密码			立即注册

P大，INTEL的IA架构手册已经到手了，还有些问题啊——

浏览过的版块