热点科技

标题: ISSCC上Stream Processor风头正劲. [打印本页]

作者: wdarklord    时间: 2007-2-14 01:03
标题: ISSCC上Stream Processor风头正劲.
ISSCC上Stream Processor风头正劲.
偶也转一片.
Stream Processors, Inc. Announces Breakthrough Digital Signal Processor Architecture at ISSCC 2007


Building on more than eight years of research at Stanford and MIT,startup reveals a new class of DSPs that makes parallel processing simple

ISSCC 2007, SAN FRANCISCO – Feb. 12, 2007 – Emerging from two years of commercial development and more than eight years of university research, Stream Processors, Inc. (SPI) today unveiled a breakthrough digital signal processor (DSP) architecture that removes the barriers to programming high-performance, massively parallel processors.

Detailed in a paper, “A 512 GOPS Stream Processor for Signal, Image and Video Processing,” being presented at this year’s International Solid State Circuits Conference, SPI’s Stream Processor™ Architecture combines unmatched levels of DSP performance with a simple and efficient C-programming model. The approach has resulted in the development of the industry’s highest-performance family of DSPs, capable of delivering greater than an order of magnitude (more than 10 times) higher performance than current commercially available DSP solutions.

To place this level of processing performance in context, a single fully software- programmable SPI Stream Processor is capable of encoding H.264 high-definition 1080p video in real-time with enough processing power to perform customer-specific video enhancements, image tuning, and video content analysis. Achieving that level of performance using traditional DSPs could require as many as 15 chips, significantly increasing engineering effort, development time and overall project risk.

Making Parallelism Work
Once confined to the realm of supercomputers, the concept of parallel or multi-core processing – using more than one central processing unit (CPU) or processor core to increase computation speed – has long been seen as a way to achieve higher levels of performance. Recently, multi-core solutions such as the AMD® dual-core Opteron™ and the Intel® Core™2 Duo have been shown to be effective when running large independent tasks at the operating system level. However, multi-core architectures have not been successful at accelerating individual embedded DSP applications.

“The key problem has been that writing software to take full advantage of the increased processing power offered by parallelism has always been time- consuming and difficult,” said Will Strauss, president of the market research firm Forward Concepts. “While Intel and AMD have started to solve this problem in the personal computing and server markets with multi-core processors, the problem remains in embedded markets. These markets require an energy-efficient, programmable digital signal processor with the computational capacity of tens to hundreds of cores applied to individual tasks. By re-thinking the roles of the architecture, programming model and compiler tools, SPI has created a new class of DSPs that makes parallel processing practical.”

Prof. Bill Dally, co-founder, chairman and chief science officer for SPI added, “When we began our research 12 years ago, we quickly realized that traditional architectures were running out of steam. A new approach was needed. Simply putting more cores on a chip doesn’t address the real issues of bandwidth, data locality, and ease of programming. Today’s demanding embedded applications like H.264 HD encoding and analytics, image processing, video surveillance, wireless communication, search, and encryption, all benefit from the performance gain and programming simplicity offered by SPI’s Stream Processor Architecture.”

The Stream Processor Architecture
At the heart of SPI’s Stream Processor Architecture (Figure 1) is a high-performance data-parallel unit (DPU), which is able to sustain hundreds of billions of operations per second (GOPS). Two industry-standard CPU cores are included to support the DPU: a system CPU runs Linux and handles I/O; another core runs main DSP threads and offloads processing of compute-intensive kernel functions to the DPU.


Figure 1: SPI’s Stream Processor Architecture as implemented in the SP16-G160

A key feature of the architecture is its compiler-managed memory hierarchy that leverages the data-parallelism and locality characteristics of signal processing applications. A simple C programming model allows specification of compute-intensive kernel functions that process streams of data records, enabling the compiler and hardware to efficiently manage on-chip memory and synchronize runtime direct-memory access (DMA). This approach eliminates the need for a cache and greatly increases predictability of throughput, simplifying the overall programming task.

The architecture exploits multiple levels of parallelism:
•    task-level parallelism between the system processor, DSP processor and DPU
•    data-level parallelism (DLP) with multiple lanes executing the same instructions on different data in parallel
•    instruction-level parallelism (ILP) via very long instruction word (VLIW) driving multiple arithmetic logic units (ALUs) per lane
•    sub-word single instruction multiple data (SIMD) in which each ALU can operate on multiple operands

On the DPU, a kernel function runs identically on every lane processing different data. Built-in support for conditionals and high-speed inter-lane communications provides more versatility than conventional SIMD architectures. The single-threaded execution model provides inherent load-balancing, eliminating the need for code partitioning across multiple cores. Another advantage to SPI’s architecture is the ability to easily scale to higher levels of performance by adding more lanes without the need to restructure software.

Development Tools
SPI’s RapiDev™ tool suite supports a standard development and debug flow using C language tools running on a Windows or Linux platform. RapiDev leverages the predictability of SPI’s Stream Processor Architecture to provide a linear path to performance-optimized code. The tools suite enables application source code compatibility across devices with different numbers of lanes and ALUs, providing greater scalability and portability.

About Stream Processors, Inc.
Stream Processors, Inc. (SPI) is a privately held fabless semiconductor company delivering an innovative stream processing architecture that helps consumer and industrial companies accelerate product development cycles and dramatically reduce system development costs. SPI was founded in 2004 to address the new era of compute-intensive applications requiring radically increased levels of processor performance and power efficiency. The company's technology and products improve application productivity by making parallel processing easier to program and use. Additional information can be found at http://www.streamprocessors.com/.

作者: rockzhao0522    时间: 2007-2-14 01:25
这下hd h.264 encoding应该有些戏了。
作者: sayid1026    时间: 2007-2-14 11:16

全E文
作者: wagawskt    时间: 2007-2-15 13:20
支持技术贴~
作者: buyi21    时间: 2007-2-15 13:28
通用性越高性能越低,A这么拼死搞GPGPU估计效率会降很多,

但估计怎么着性能会比全能的CPU强



不过I已经表示kentsfield性能不弱于stream了,现在就看真理再谁手中了~
作者: lirongde1    时间: 2007-2-15 14:21
原帖由 the_god_of_pig 于 2007-2-15 13:28 发表
不过I已经表示kentsfield性能不弱于stream了,现在就看真理再谁手中了~
Intel最近两年的PDF都承认GP的性能增长不如SP。
作者: woodsky    时间: 2007-2-15 17:16
实际效果决定一切~

说是GPU可以物理加速,小一年了,没有任何成品

folding@home说是20倍,PPD却很低

可以硬解视频,如果U很好看到的占用率降低只有20%


传说带来革命的PPU性能溃败~~~



不能不让人怀疑GPGPU是不是浮云
作者: dragonlin    时间: 2007-2-15 17:34
另外,我可以YY个预测,

stream 在3D下的性能与kentsfield相当,

呵呵,YY一把,假设stream在GPGPU时的效率与PPU在游戏的物理加速时的效率一样,

68Gflops PPU=14Gflops的肉,85Gflops的kentsfield=400Gflops的PPU=3D下的stream


欢迎挖坟
作者: pp99pp    时间: 2007-2-15 17:36
原帖由 the_god_of_pig 于 2007-2-15 17:34 发表
另外,我可以YY个预测,

stream 在3D下的性能与kentsfield相当,

呵呵,YY一把,假设stream在GPGPU时的效率与PPU在游戏的物理加速时的效率一样,

68Gflops PPU=14Gflops的肉,85Gflops的kentsfield=40 ...
你也实在太不会看风向了.很多东西不是挖不挖坟的问题.........
作者: pangxie    时间: 2007-2-15 17:36
卖豆腐喽~~~~~~~
作者: BOSSYI    时间: 2007-2-15 18:00
原帖由 嘉蓝 于 2007-2-15 17:36 发表

你也实在太不会看风向了.很多东西不是挖不挖坟的问题.........
风向是风向,现实是现实

你总不能因为中国GDP未来50年会超过USA,就说中国现在比USA富吧
作者: zhonggq    时间: 2007-2-16 14:04
忽然想起来ATI说x1900xtx物理性能是PPU的9倍










PPU性能还不如14.93flops的肉,

14.93x9=134.4Gflops,

kentsfield是85.33Gflops

stream是kentsfield的134.4/85.33Gflops=1.575倍,

考虑A经常吹水,I敢说c2q不弱于stream也就没什么奇怪的了


不过这只是物理性能,不过物理运算并行度已经很高了,其他GPGPU项目性能估计也就是这个样子了~
作者: zmy5418896    时间: 2007-2-16 14:32
楼上基本在说笑
请看清楚anandtech的原文是怎么写的
作者: apple983    时间: 2007-2-16 15:25
VR-ZONE还说Larrabee的性能是G80的16倍,core数还刚好是16,这样你还可以得出一个x86内核的图形性能=一个G80的论断呢,哈哈。
作者: 896338    时间: 2007-2-16 15:34
原帖由 来不及思考 于 2007-2-16 14:32 发表
楼上基本在说笑
请看清楚anandtech的原文是怎么写的
阁下不知道anandtech的那篇鸟文有个update版吧,这2个图是update版的,

update版的测试里边的《恶棍都市》用的是新版,把有PPU和software时的物理效果调成一样的测的,还有增加对cpu的smp支援,可以把cpu的第二个线程用来算物理。

除此之外anandtech的原文没什么特别的吧~


呵呵,我也觉得这么猜测很YY,思考大人和E大都有NDA在手,我也不敢说什么喽
作者: ps1987    时间: 2007-2-16 15:35
update版本倒是不知道...
看来我火星了...CPU还是蛮强的啊




欢迎光临 热点科技 (http://www.itheat.com/activity/) Powered by Discuz! X3.2