SAN FRANCISCO, Calif. — Toshiba described Wednesday (Feb 10) a novel mobile media processor it is now sampling using stacked custom DRAM, one of a handful of such devices discussed at the International Solid State Circuits Conference (ISSCC).
Renesas Technology also showed a media processor for digital TVs. In addition, Intel Labs, and academics showed various highly parallel research chips for handling media or object recognition.
All the chips underscored a move to use of multiple kinds of cores to handle an increasingly varied set of media applications. Some raised a call for reconfigurable processors or parallel arrays to handle jobs such as face and object recognition.
One mantra all the engineers shared was the need to optimize designs for the best performance per Watt while keeping die size at a minimum.
"Mobile image processors [for example] are limited to less than 500 mW, and their price has become less than $5 so area efficiency is required," said Takashi Kurafuji, author of one of two papers from Renesas.
Toshiba showed a media processor sandwiched between a mobile DDR and a custom DRAM using novel packaging technology. Toshiba is sampling the device now as a merchant chip. Its cost was not immediately available but is presumably high given the use of a custom DRAM and packaging technology.
The 40nm device consumes just 222 mW to decode H.264 video at 30 frames/second. It can decode VGA-class video at 15 frames/s in software, consuming just 71 mW. It has leakage power of just 15 microW in standby and 1.7 mW in sleep mode.
A 6 x 6.2 mm logic chip at the heart of the package uses 14 cores including one ARM Cortex A9 MPCore. It also includes separate video, 3-D graphics and audio/video multiprocessing blocks.
A mobile DDR DRAM is stacked on top of the logic die using micro-bumps and wire bonding. A custom DRAM at the bottom of the stack rides on a so-called re-distribution layer as a substrate, uses micro-bumps and wire bonding and sports a 10.6 Gbytes/s memory bandwidth.
Toshiba would not identify the third party that supplies the DRAM. Toshiba creates the stack in-house in an assembly stage. The custom DRAM interface uses 85 percent less power in some operations than a traditional DDR interface, Toshiba said.
For its part Renesas described its MX-2 image processing core, part of a larger 45nm media processor for digital TVs. That chip is capable of up to 37.3 Giga Operations/s per W.
The Renesas chip, co-developed with Hitachi, uses three types of cores depending on the application's need for parallelism. They include a 648 MHz SH4A core, two MX-2 cores and four reconfigurable processing elements.
Renesas engineers echoed comments from others who said they are finding a mix of engines ranging single-core hosts to parallel arrays are best for handling at lowest power today's variety of media applications.
Intel Labs gave a nod to the need for highly parallel architectures for media processing in a paper describing a 32nm reconfigurable device optimized for lowest area and power consumption.
The Intel device uses two arrays of processing blocks. One was built up from nodes using four three-input look up tables and three four-bit adders optimized for four-bit multiply operations. The other was a collection of 16 32-bit register file tiles.
The resulting device has peak efficiency of 2.6 Tera Operations/s/W at 340mV. Parts of the device can run up to 6.7 GHz in some configurations.
Other papers from academics painted a picture of the kinds of applications that may be on the horizon for media processors beyond decoding high def video and driving 3-D polygons.
Tzu-Der Chuang, a doctoral candidate at National Taiwan University, described a chip that can both decode and scale video to various resolutions, display types and bit rates. It supports multi-view video coding (MVC) and scalable video coding (SVC) standards as well as H.264. The chip also pushed beyond today's high def to handle video resolutions up to 4096 x 2160 pixels at 24 frames/s
The paper, selected as one of the best of ISSCC, described a 2.92mm x 2.92mm device that can decode H.264 consuming 59 mW at 210 MHz and lower rates for MVC and SVC decode. The device uses separate processing pathways for H.264, MVC and SVC.
A paper from the Korea Advanced Institute of Technology described an object recognition processor that consumed just 345 mW while processing video at 30 frames/s. The device packs 51 blocks on two layers of silicon linked on a star network-on-chip.
In a rare humorous moment at ISSCC, the author showed a video of the device scanning video of objects on a shelf in a grocery store. It found target objects including a package of diapers and a Hello Kitty lunch pail.
Tse-Wei Chen, a doctoral student at National Taiwan University, suggested tomorrow's media chips need to go beyond even image recognition. His Semantic Analysis SoC handles machine learning for applications such retrieving a related image or recognizing a face.
The 90nm chip processes up to 671 G Operations/s/W and measures 28mm2. "This is a prototype to highlight the need for future machine-learning work," said Chen.