ISSCC上Stream Processor风头正劲.

wdarklord · wdarklord

ISSCC上Stream Processor风头正劲.
偶也转一片.

Stream Processors, Inc. Announces Breakthrough Digital Signal Processor Architecture at ISSCC 2007

Building on more than eight years of research at Stanford and MIT,startup reveals a new class of DSPs that makes parallel processing simple

ISSCC 2007, SAN FRANCISCO – Feb. 12, 2007 – Emerging from two years of commercial development and more than eight years of university research, Stream Processors, Inc. (SPI) today unveiled a breakthrough digital signal processor (DSP) architecture that removes the barriers to programming high-performance, massively parallel processors.

Detailed in a paper, “A 512 GOPS Stream Processor for Signal, Image and Video Processing,” being presented at this year’s International Solid State Circuits Conference, SPI’s Stream Processor™ Architecture combines unmatched levels of DSP performance with a simple and efficient C-programming model. The approach has resulted in the development of the industry’s highest-performance family of DSPs, capable of delivering greater than an order of magnitude (more than 10 times) higher performance than current commercially available DSP solutions.

To place this level of processing performance in context, a single fully software- programmable SPI Stream Processor is capable of encoding H.264 high-definition 1080p video in real-time with enough processing power to perform customer-specific video enhancements, image tuning, and video content analysis. Achieving that level of performance using traditional DSPs could require as many as 15 chips, significantly increasing engineering effort, development time and overall project risk.

Making Parallelism Work
Once confined to the realm of supercomputers, the concept of parallel or multi-core processing – using more than one central processing unit (CPU) or processor core to increase computation speed – has long been seen as a way to achieve higher levels of performance. Recently, multi-core solutions such as the AMD® dual-core Opteron™ and the Intel® Core™2 Duo have been shown to be effective when running large independent tasks at the operating system level. However, multi-core architectures have not been successful at accelerating individual embedded DSP applications.

“The key problem has been that writing software to take full advantage of the increased processing power offered by parallelism has always been time- consuming and difficult,” said Will Strauss, president of the market research firm Forward Concepts. “While Intel and AMD have started to solve this problem in the personal computing and server markets with multi-core processors, the problem remains in embedded markets. These markets require an energy-efficient, programmable digital signal processor with the computational capacity of tens to hundreds of cores applied to individual tasks. By re-thinking the roles of the architecture, programming model and compiler tools, SPI has created a new class of DSPs that makes parallel processing practical.”

Prof. Bill Dally, co-founder, chairman and chief science officer for SPI added, “When we began our research 12 years ago, we quickly realized that traditional architectures were running out of steam. A new approach was needed. Simply putting more cores on a chip doesn’t address the real issues of bandwidth, data locality, and ease of programming. Today’s demanding embedded applications like H.264 HD encoding and analytics, image processing, video surveillance, wireless communication, search, and encryption, all benefit from the performance gain and programming simplicity offered by SPI’s Stream Processor Architecture.”

The Stream Processor Architecture
At the heart of SPI’s Stream Processor Architecture (Figure 1) is a high-performance data-parallel unit (DPU), which is able to sustain hundreds of billions of operations per second (GOPS). Two industry-standard CPU cores are included to support the DPU: a system CPU runs Linux and handles I/O; another core runs main DSP threads and offloads processing of compute-intensive kernel functions to the DPU.

Figure 1: SPI’s Stream Processor Architecture as implemented in the SP16-G160

A key feature of the architecture is its compiler-managed memory hierarchy that leverages the data-parallelism and locality characteristics of signal processing applications. A simple C programming model allows specification of compute-intensive kernel functions that process streams of data records, enabling the compiler and hardware to efficiently manage on-chip memory and synchronize runtime direct-memory access (DMA). This approach eliminates the need for a cache and greatly increases predictability of throughput, simplifying the overall programming task.

The architecture exploits multiple levels of parallelism:
• task-level parallelism between the system processor, DSP processor and DPU
• data-level parallelism (DLP) with multiple lanes executing the same instructions on different data in parallel
• instruction-level parallelism (ILP) via very long instruction word (VLIW) driving multiple arithmetic logic units (ALUs) per lane
• sub-word single instruction multiple data (SIMD) in which each ALU can operate on multiple operands

On the DPU, a kernel function runs identically on every lane processing different data. Built-in support for conditionals and high-speed inter-lane communications provides more versatility than conventional SIMD architectures. The single-threaded execution model provides inherent load-balancing, eliminating the need for code partitioning across multiple cores. Another advantage to SPI’s architecture is the ability to easily scale to higher levels of performance by adding more lanes without the need to restructure software.

Development Tools
SPI’s RapiDev™ tool suite supports a standard development and debug flow using C language tools running on a Windows or Linux platform. RapiDev leverages the predictability of SPI’s Stream Processor Architecture to provide a linear path to performance-optimized code. The tools suite enables application source code compatibility across devices with different numbers of lanes and ALUs, providing greater scalability and portability.

About Stream Processors, Inc.
Stream Processors, Inc. (SPI) is a privately held fabless semiconductor company delivering an innovative stream processing architecture that helps consumer and industrial companies accelerate product development cycles and dramatically reduce system development costs. SPI was founded in 2004 to address the new era of compute-intensive applications requiring radically increased levels of processor performance and power efficiency. The company's technology and products improve application productivity by making parallel processing easier to program and use. Additional information can be found at http://www.streamprocessors.com/.

rockzhao0522 · rockzhao0522

这下hd h.264 encoding应该有些戏了。

sayid1026 · sayid1026

全E文

wagawskt · wagawskt

支持技术贴~

buyi21 · buyi21

通用性越高性能越低，A这么拼死搞GPGPU估计效率会降很多，

但估计怎么着性能会比全能的CPU强

不过I已经表示kentsfield性能不弱于stream了，现在就看真理再谁手中了~

lirongde1 · lirongde1

原帖由 the_god_of_pig 于 2007-2-15 13:28 发表
不过I已经表示kentsfield性能不弱于stream了，现在就看真理再谁手中了~

Intel最近两年的PDF都承认GP的性能增长不如SP。

woodsky · woodsky

实际效果决定一切~

说是GPU可以物理加速，小一年了，没有任何成品

folding@home说是20倍，PPD却很低

可以硬解视频，如果U很好看到的占用率降低只有20%

传说带来革命的PPU性能溃败~~~

不能不让人怀疑GPGPU是不是浮云

dragonlin · dragonlin

另外，我可以YY个预测，

stream 在3D下的性能与kentsfield相当，

呵呵，YY一把，假设stream在GPGPU时的效率与PPU在游戏的物理加速时的效率一样，

68Gflops PPU=14Gflops的肉，85Gflops的kentsfield=400Gflops的PPU=3D下的stream

欢迎挖坟

pp99pp · pp99pp

原帖由 the_god_of_pig 于 2007-2-15 17:34 发表
另外，我可以YY个预测，

stream 在3D下的性能与kentsfield相当，

呵呵，YY一把，假设stream在GPGPU时的效率与PPU在游戏的物理加速时的效率一样，

68Gflops PPU=14Gflops的肉，85Gflops的kentsfield=40 ...

你也实在太不会看风向了.很多东西不是挖不挖坟的问题.........

pangxie · pangxie

卖豆腐喽～～～～～～～

		自动登录	找回密码
密码			立即注册