Anyone who thinks computer memory is a dull topic would have had that opinion changed at the DesignCon 2014 keynote speech by Thomas Pawlowski, Fellow and Chief Technologist at Micron Technology. In a fast-moving talk on Jan. 30, Pawlowski discussed three revolutions in the making—the Micron Automata Processor, the dismantling of the "memory wall," and new technologies with memory abstraction.
Pawlowski first turned his attention to the Micron Automata Processor, which was announced Nov. 18, 2013. This processor offers a fundamentally new computing architecture that leverages the intrinsic parallelism of DRAM. "It is absolutely a different way of doing computing and we call it the dawn of a new era in data processing," he said.
The Micron Automata Processor is a massively parallel computing solution. It is based on memory rather than logic. It has no ALU, doesn't run op codes, and is essentially a symbol processor. To build it, Pawlowski said, Micron implemented a non-deterministic finite state machine in silicon. He said the processor offers "very fast execution with a light memory footprint."
Tackling NP Hard Problems
Pawlowski said the Micron Automata Processor can tackle NP complete and NP hard problems that other computer architectures cannot handle. (As he reminded the audience, "NP" means "non-deterministic polynomial." NP complete problems are very difficult to solve, but they can at least be verified. NP hard problems are incredibly difficult to solve and verify.)
Initial applications of the processor include "anything you can describe symbolically" as a graph of relationships between entities. Examples include video and image analytics, data mining, bio-informatics, medical diagnostics, social networking, and network security. In his keynote, Pawlowski focused on computational biology problems.
Pawlowski observed that there are about 2 million proteins in the human body, and the longest is 27,000 amino acids long. Searching all the proteins would demand a search space of 10 to the power of 65,052. "We don't have calculators that let you represent a number that big," Pawlowski said, "but within that space may lie the cures for everything we'll ever care about. We're giving people a tool to do that kind of thing."
Pawlowski also provided a comparison of the Micron Automata Processor to a 48-CPU cluster for a planted motif search problem (which attempts to find common genomic sequences in noisy data). The 48-CPU cluster generated 2,500W while the Micron processor ran at 250W. One problem set that took 13.96 minutes on the Micron processor took 46.9 hours on the 48-CPU cluster. There were other problem sets that the 48-CPU cluster could not solve at all.
Micron is introducing the processor as an accelerator. Pawlowski said the company has silicon in debug and has a PCIe development board that accommodates 48 processors. The company also provides a software development kit. Further information is at the Micron website.
Is the Memory Wall Still Standing?
Much has been said about a so-called "memory wall" in which the throughput needs of the system outstrip the performance of the memory. Pawlowski acknowledged the need to overcome limits on throughput and latency, but he argued that current conceptions of the "memory wall" are misleading.
"Everyone has a mistaken belief that things improve after the so-called Moore's law," he said. "If you had made a prediction ten years ago about what [processor] performance would be right now, you would be off by a factor of ten." This is because single-core performance increases have dramatically slowed. However, modern systems typically have multiple processor cores, and many are multi-threaded as well.
When people complain about the memory wall, Pawlowski said, they usually cite single-channel, single-rank memory performance compared to multi-core CPUs. But a multi-core system probably has multiple DRAM DIMM channels and multiple ranks per DIMM. To get a fair comparison, one should consider multi-core processors along with multi-channel, multi-rank memory.
Even so, Pawlowski acknowledged, memory is "at best keeping up." With this in mind Micron developed the Hybrid Memory Cube (HMC), a 3D stacked die configuration that includes memory and logic. If you compare an equal amount of DIMM capacity to an equal amount of memory capacity in the HMC footprint, he said, throughput in the HMC is 41 times greater than it is for conventional DRAM.
The result is a very different kind of wall. Said Pawlowski: "Existing CPUs can't hack it. They are the ones left behind."
Preparing for the Future
In the third part of his keynote speech, Pawlowski observed that memory is taking a growing percentage of silicon area. In the iPhone it may be 30% to 40%. In high-performance computing, 86% of the area may be memory. Micron feels a "great responsibility" for improving that situation, he said.
While the "window is really open" to replace DRAM, there is a lot you really don't want to know about underlying memory technology, Pawlowski said. And the answer to that is memory abstraction. "The HMC exposes nothing," he said. "You have a SERDES interface, a very simple command set, and all you do is say what you want done. The details are not your problem. That's the way things ought to be everywhere."
It's thus important to transition away from direct control of memory, Pawlowski concluded. "We'll be introducing other new products that allow that transition to happen, and it won't be on the standard DDR3 and DDR4 kinds of buses." And with that tantalizing tidbit, Pawlowski concluded a very memorable keynote.
Related Blog Posts
MemCon Keynote: Why Hybrid Memory Cube Will "Revolutionize" System Memory
Wide I/O 2, Hybrid Memory Cube (HMC)—Memory Models Advance 3D-IC Standards
Scaling the Semiconductor Memory Wall