RMG and Associates

Insightful, timely, and accurate

Semiconductor Technology Consulting

Semiconductor & Patent Expert Consulting


(408) 446-3040



ISSCC 2010


1. Intel, Renesas, Toshiba show mobile media chips/ EETimes

2. ARM to detail power-efficient design technique / EETimes

3. Inductive coupling packs flash drive in a chip/ EETimes

4.IMEC, Renesas develop reconfigurable RF transceiver in 40nm CMOS / EETimes

5. Chip links hit 20 Gbits/s, power lows at ISSCC / EETimes



1. Intel, Renesas, Toshiba show mobile media chips

Toshiba describes novel stacked DRAM SoC at ISSCC
SAN FRANCISCO, Calif. — Toshiba described Wednesday (Feb 10) a novel mobile media processor it is now sampling using stacked custom DRAM, one of a handful of such devices discussed at the International Solid State Circuits Conference (ISSCC).

Renesas Technology also showed a media processor for digital TVs. In addition, Intel Labs, and academics showed various highly parallel research chips for handling media or object recognition.

All the chips underscored a move to use of multiple kinds of cores to handle an increasingly varied set of media applications. Some raised a call for reconfigurable processors or parallel arrays to handle jobs such as face and object recognition.

One mantra all the engineers shared was the need to optimize designs for the best performance per Watt while keeping die size at a minimum.

"Mobile image processors [for example] are limited to less than 500 mW, and their price has become less than $5 so area efficiency is required," said Takashi Kurafuji, author of one of two papers from Renesas.

Toshiba showed a media processor sandwiched between a mobile DDR and a custom DRAM using novel packaging technology. Toshiba is sampling the device now as a merchant chip. Its cost was not immediately available but is presumably high given the use of a custom DRAM and packaging technology.

The 40nm device consumes just 222 mW to decode H.264 video at 30 frames/second. It can decode VGA-class video at 15 frames/s in software, consuming just 71 mW. It has leakage power of just 15 microW in standby and 1.7 mW in sleep mode.

A 6 x 6.2 mm logic chip at the heart of the package uses 14 cores including one ARM Cortex A9 MPCore. It also includes separate video, 3-D graphics and audio/video multiprocessing blocks.

A mobile DDR DRAM is stacked on top of the logic die using micro-bumps and wire bonding. A custom DRAM at the bottom of the stack rides on a so-called re-distribution layer as a substrate, uses micro-bumps and wire bonding and sports a 10.6 Gbytes/s memory bandwidth.

Toshiba would not identify the third party that supplies the DRAM. Toshiba creates the stack in-house in an assembly stage. The custom DRAM interface uses 85 percent less power in some operations than a traditional DDR interface, Toshiba said.

For its part Renesas described its MX-2 image processing core, part of a larger 45nm media processor for digital TVs. That chip is capable of up to 37.3 Giga Operations/s per W.

The Renesas chip, co-developed with Hitachi, uses three types of cores depending on the application's need for parallelism. They include a 648 MHz SH4A core, two MX-2 cores and four reconfigurable processing elements.

Renesas engineers echoed comments from others who said they are finding a mix of engines ranging single-core hosts to parallel arrays are best for handling at lowest power today's variety of media applications.

Intel Labs gave a nod to the need for highly parallel architectures for media processing in a paper describing a 32nm reconfigurable device optimized for lowest area and power consumption.

The Intel device uses two arrays of processing blocks. One was built up from nodes using four three-input look up tables and three four-bit adders optimized for four-bit multiply operations. The other was a collection of 16 32-bit register file tiles.

The resulting device has peak efficiency of 2.6 Tera Operations/s/W at 340mV. Parts of the device can run up to 6.7 GHz in some configurations.

Other papers from academics painted a picture of the kinds of applications that may be on the horizon for media processors beyond decoding high def video and driving 3-D polygons.

Tzu-Der Chuang, a doctoral candidate at National Taiwan University, described a chip that can both decode and scale video to various resolutions, display types and bit rates. It supports multi-view video coding (MVC) and scalable video coding (SVC) standards as well as H.264. The chip also pushed beyond today's high def to handle video resolutions up to 4096 x 2160 pixels at 24 frames/s

The paper, selected as one of the best of ISSCC, described a 2.92mm x 2.92mm device that can decode H.264 consuming 59 mW at 210 MHz and lower rates for MVC and SVC decode. The device uses separate processing pathways for H.264, MVC and SVC.

A paper from the Korea Advanced Institute of Technology described an object recognition processor that consumed just 345 mW while processing video at 30 frames/s. The device packs 51 blocks on two layers of silicon linked on a star network-on-chip.

In a rare humorous moment at ISSCC, the author showed a video of the device scanning video of objects on a shelf in a grocery store. It found target objects including a package of diapers and a Hello Kitty lunch pail.

Tse-Wei Chen, a doctoral student at National Taiwan University, suggested tomorrow's media chips need to go beyond even image recognition. His Semantic Analysis SoC handles machine learning for applications such retrieving a related image or recognizing a face.

The 90nm chip processes up to 671 G Operations/s/W and measures 28mm2. "This is a prototype to highlight the need for future machine-learning work," said Chen.




2. ARM to detail power-efficient design technique

SAN FRANCISCO—Using a hybrid technique for dynamic detection and correction of timing errors, researchers from ARM Holdings plc and the University of Michigan have demonstrated a 52 percent reduction in power on a 65-nm ARM instruction set architecture (ISA) processor running at more than 1 GHz, according to a paper scheduled to be presented at the International Solid State Circuits Conference (ISSCC) here Tuesday (Feb. 9).

Researchers from Intel Corp., NTT Group, Qualcomm Inc. and others are also set to detail new technologies for power efficiency during the same ISSCC session, "Low-Power Processors and Communication," scheduled for Tuesday afternoon.

The ARM-University of Michigan paper details the use of the hybrid technique, known as Razor, applied to a processor with timing paths representative of an industrial design. The processor implements a subset of the ARM ISA in a design that with balanced pipeline stages, resulting in critical memory access and clock-gating enable paths, according to the paper.

The paper describes Razor as a combination of timing-error detection circuits, error-recovery mechanisms and voltage-frequency tuning. According to the paper, Razor creates a system which is robust in the face of timing errors and can be tuned to an efficient operating point by eliminating unused guardbands.

Unlike canary or tracking circuits, a Razor system can survive fast-moving and transient events and adapt itself to the prevailing conditions, allowing excess margins to be reclaimed, the paper states. The savings from margin reclamation can translate into better power efficiency or as a parametric yield improvement for a batch of devices, according to the paper.

The processor design used in the experiments was created with industry standard EDA tools and a static timing analysis signoff frequency of 724 MHz, according to the paper. It was fabricated using a 65-nm CMOS process by Taiwan's United Microelectronics Corp. The paper states that silicon measurements on 63 samples show a 52 percent power reduction of the overall distribution for 1GHz operation.

Other papers of note scheduled for the session include one from researchers at Intel, which details throughput gains of 12 to 23 percent on a 45-nm, 1.3 GHz microprocessor core employing error-detection circuits, tunable replica circuits, and error-recovery circuits to mitigate dynamic variation guardbands.

Researchers from NTT are scheduled to describe a 90-nm CMOS network processor comprising dual CPUs that enable a residential gateway to forward packets at 2Gb/s with IP security and packet filtering. By offloading the packet-handling to a packet engine, the power consumption of this function is limited to 24mW, according the paper.



3.Inductive coupling packs flash drive in a chip

ISSCC speaker says wireless link leads in cost, power
SAN FRANCISCO, Calif. — A researcher from Keio University in Japan showed a way to put an entire solid-state disk in the footprint of a single chip in an evening talk at the International Solid State Circuits Conference (ISSCC) here.

Keio researchers used inductive coupling to link a stack of 128 NAND flash die and a controller. The wireless interface grabbed the attention of engineers at a late-night ISSCC session on energy efficient interfaces. The session ranged across a wide territory, covering work in optical chip interfaces and data centers interconnects.

As many as three papers at this year's ISSCC will show advances in inductive coupling, said Tadahiro Kuroda, a professor at Keio University. The solid-state drive in a chip-sized package uses inductive coupling to provide 2 Gbits/second throughput so that a single controller can talk to any of the flash chips in the 128-chip stack.

Another ISSCC paper will show inductive coupling to link a processor to its memory using one-thirtieth the power and a third the die area of a DDR connection, Kuroda said. Another ISSCC paper discusses using the wireless technique to create a memory card that is more secure than a conventional plug-in SD card, he said.

Inductive coupling compares favorably with through silicon vias in cost, reliability and energy dissipation, he said. Such interfaces typically cost 20 cents per chip less than through silicon vias, he added.

The session started with a presentation about the growing power needs of big Internet data centers. One of the largest data centers now under construction is a network peering center in Miami that will cover 750,000 square feet and consume about 80 MegaWatts.

"That's about the same amount of power the rest of Miami consumes," said Subodh Bapat of Sun Microsystems. "Florida Light and Power is building a power station just for this data center," he added.

The number of large data centers and the number of servers they use is growing rapidly, making them a very visible target of energy use for regulators. However, the client PCs and smartphones they link to still consumer more energy, and the power plants that serve them are more inefficient than the data centers, Bapat suggested.

Separately, Ian Young, a senior fellow in Intel Corp.'s research group reported on advances in optical chip-to-chip interfaces. The company published at least two papers in January on its work in the field.

"This is looking pretty promising because optical links run from 10cm to 1 meter and losses in the fibre ribbon are negligible," said Young.

Nevertheless, he declined to say when optical chip interconnects could become a viable commercial alternative for a major microprocessor.

"Right now everyone is just asking, when do we go optical," said Young. "We've got to find a driving application to get people excited about it, perhaps high performance computing," he added.



4.IMEC, Renesas develop reconfigurable RF transceiver in 40nm CMOS

PARIS — IMEC (Leuven, Belgium) and its reconfigurable radio program partners Renesas Technology Corp. (Tokyo, Japan) and M4S (Leuven, Belgium) claimed they have developed a single-chip reconfigurable multi-standard wireless transceiver in 40nm low-power CMOS.

Partners indicated that the receiver is software configurable across all channels in the frequency bands between 100MHz and 6GHz, and the transmitter reaches low out-of-band noise, targeting SAW-less 3GPP-LTE operation.

Then, the fully reconfigurable transceiver is compatible with various wireless standards and applications, including the upcoming mobile broadband 3GPP-LTE standard. It integrates multi-standard programmability in a 5mm< sup> chip and targets recent single mode radios in mobile devices —handsets, smart phones, PDAs, PC cards, USB dongles.

In the next phase of the Green Radio research program, IMEC said it aims to continue to reduce the bill of materials and energy consumption by pursuing research on digitally-inspired SAX-less transceivers and power efficient transmitters.

Renesas joined IMEC reconfigurable RF program in 2008 to perform research on 45-nm RF transceivers targeting cognitive radios capable of sending and receiving data at gigabits per second.



5.Chip links hit 20 Gbits/s, power lows at ISSCC

SAN FRANCISCO, Calif. — Researchers showed advances driving wider, faster and lower power chip-to-chip interconnects in a session at the International Solid State Circuits Conference (ISSCC) here.

An Intel Labs engineer described a 470 Gbit/second chip-to-chip link that consumes just 1.4 mW/Gb/s. Separately, a UCLA researcher showed an equalizer built in 90nm technology that can drive a 20 Gbit/s serial link.

News from physical-layer chip designer Aquantia underlined the drive to faster wired links. Aquantia announced Tuesday (Feb. 9) it has started shipping 10GBase-T PHY chips to drive 10 Gbit/s links over copper cables.

Aquantia said Cisco Systems has started shipping 10GBase-T links on its Catalyst 4900M, Catalyst 6500 and Nexus 7000 switches. Intel is now supporting dual 10GBase-T interfaces on its Ethernet Server Adapter X520-T2, said Aquantia which posted a video of interviews with Cisco and Intel execs.

As networks speed up, so must the chip-to-chip links in systems that drive them. Frank O'Mahony of Intel Labs used 47 parallel 10 Gbit/s links to hit what he claimed was a new low of 1.4 mW/Gbit/s. Interfaces such as Intel's QuickPath Interconnect and AMD's HyperTransport typically use 10 to 20 parallel links today.

The Intel researcher's target was future CPU-to-CPU and CPU-to-memory interconnects that demand higher bandwidth at low power. "Emerging CPU and memory applications will require terabytes of socket-to-socket bandwidth in the next decade," O'Mahony said.

The Intel researcher showed a method of sharing clock signals over bundles of parallel lanes using fewer de-skewing elements to save power. A new receiver design enabled wake up signaling in less than five nanoseconds to support aggressive power management. The resulting two- and five-inch links used ten times less power than similar high bandwidth links while also reducing silicon area over prior work, he said.

Robert Reutemann, co-founder of analog and mixed signal design company Miromicro (Zurich), provided a comparison with CPU interfaces in current systems. He described a 6.4 Gbit/s receiver core that consumes 4.5 mW/Gbit/s in a 65nm process.

The core is currently being used in computer servers. The paper was co-authored by researchers at IBM.

Finally, papers from NEC, STMicroelectronics and UCLA showed equalization techniques that could drive speeds beyond 10 Gbits/s over a single lane.

The UCLA paper was the most aggressive of the three, describing a 40 mW decision-feedback equalizer made in 90nm technology that could compensate for 20dB of signal loss to drive a 20 Gbit/s data rate.

The author compared his part to three other published works, showing it achieved a new level of power per bit over channel loss.

"We believe our equalizer is the first using non-return to zero signaling to achieve 20 Gbits/s with reasonable bit-error rate and a horizontal open eye," said Sameh A. Ibrahim, a former UCLA researcher now with Marvell.