The semiconductor is the foundational technology of the digital age. It gave Silicon Valley its name. It sits at the heart of the computing revolution that has transformed every facet of society over the past half-century. The pace of improvement in computing capabilities has been breathtaking and relentless since Intel introduced the world’s first microprocessor in 1971. In line with Moore’s Law, computer chips today are many millions of times more powerful than they were fifty years ago. Yet while processing power has skyrocketed over the decades, the basic architecture of the computer chip has until recently remained largely static. For the most part, innovation in silicon has entailed further miniaturizing transistors in order to squeeze more of them onto integrated circuits. Companies like Intel and AMD have thrived for decades by reliably improving CPU capabilities in a process that Clayton Christensen would identify as “sustaining innovation”. Today, this is changing in dramatic fashion. AI has ushered in a new golden age of semiconductor innovation. The unique demands and limitless opportunities of machine learning have, for the first time in decades, spurred entrepreneurs to revisit and rethink even the most fundamental tenets of chip architecture. Their goal is to design a new type of chip, purpose-built for AI, that will power the next generation of computing. It is one of the largest market opportunities in all of hardware today.
A New Computing Paradigm
For most of the history of computing, the prevailing chip architecture has been the CPU, or central processing unit. CPUs are ubiquitous today: they power your laptop, your mobile device, and most data centers. The CPU’s basic architecture was conceived in 1945 by the legendary John von Neumann. Remarkably, its design has remained essentially unchanged since then: most computers produced today are still von Neumann machines. The CPU’s dominance across use cases is a result of its flexibility: CPUs are general-purpose machines, capable of carrying out effectively any computation required by software. But while CPUs’ key advantage is versatility, today’s leading AI techniques demand a very specific—and intensive—set of computations. Deep learning entails the iterative execution of millions or billions of relatively simple multiplication and addition steps. Grounded in linear algebra, deep learning is fundamentally trial-and-error-based: parameters are tweaked, matrices are multiplied, and figures are summed over and over again across the neural network as the model gradually optimizes itself.
This repetitive, computationally intensive workflow has a few important implications for hardware architecture. Parallelization—the ability for a processor to carry out many calculations at the same time, rather than one by one—becomes critical. Relatedly, because deep learning involves the continuous transformation of huge volumes of data, locating the chip’s memory and computational core as close together as possible enables massive speed and efficiency gains by reducing data movement. CPUs are ill equipped to support the distinctive demands of machine learning. CPUs process computations sequentially, not in parallel. Their computational core and memory are generally located on separate modules and connected via a communication system (a bus) with limited bandwidth. This creates a choke point in data movement known as the “von Neumann bottleneck”. The upshot: it is prohibitively inefficient to train a neural network on a CPU.
The inability of conventional chips to handle modern AI algorithms is all the more significant given how prevalent machine learning applications are becoming throughout society. As AI great Yann LeCun recently said: “If you go five, ten years into the future, and you look at what do computers spend their time doing, mostly, I think they will be doing things like deep learning.” To this point, the chip that has powered the AI boom is the GPU (graphical processing unit). The GPU architecture was invented by Nvidia in the late 1990s for gaming applications. In order to render computer games’ detailed graphics at high frame rates, GPUs were purpose-built for continuous manipulation of large amounts of data. Unlike CPUs, GPUs can complete many thousands of calculations in parallel.
In the early 2010s, the AI community began to realize that Nvidia’s gaming chips were in fact well suited to handle the types of workloads that machine learning algorithms demanded. Through sheer good fortune, the GPU had found a massive new market. Nvidia capitalized on the opportunity, positioning itself as the market-leading provider of AI hardware. The company has reaped incredible gains as a result: Nvidia’s market capitalization jumped twenty-fold from 2013 to 2018. Yet as Gartner analyst Mark Hung put it, “Everyone agrees that GPUs are not optimized for an AI workload.” The GPU has been adopted by the AI community, but it was not born for AI.
In recent years, a new crop of entrepreneurs and technologists has set out to reimagine the computer chip, optimizing it from the ground up in order to unlock the limitless potential of AI. In the memorable words of Alan Kay: “People who are really serious about software should make their own hardware.” Five AI chip unicorns have emerged in the past 24 months. Several more upstarts have been snapped up at eye-popping valuations. As the legacy CPU incumbent seeking to avoid disruption, Intel alone has made two major acquisitions in this category: Nervana Systems (bought for $408M in April 2016) and Habana Labs (bought for $2B in December 2019). As this race plays out in the coming years, many hundreds of billions of dollars of enterprise value will be up for grabs.
The Next Intel?
The combination of a massive market opportunity and a blue-sky technology challenge has inspired a Cambrian explosion of creative—at time astonishing—approaches to designing the ideal AI chip. Perhaps the most attention-grabbing of the new crop of AI chip startups is Cerebras Systems. Cerebras’ audacious approach is, to put it simply, to build the largest chip ever. Recently valued at $1.7B, the company has raised $200M from top investors including Benchmark and Sequoia.
The specifications of Cerebras’ chip are mind-boggling. It is about 60 times larger than a typical microprocessor. It is the first chip in history to house over one trillion transistors (1.2 trillion, to be exact). It has 18 GB memory on-chip — again, the most ever. Packing all that computing power onto a single silicon substrate offers tantalizing benefits: dramatically more efficient data movement, memory co-located with processing, massive parallelization. But the engineering challenge is, to understate it, ludicrous. For decades, building a wafer-scale chip has been something of a holy grail in the semiconductor industry, dreamt about but never before achieved. “Every rule and every tool and every manufacturing device was designed [for] a normal-sized chocolate chip cookie, and we delivered something the size of the whole cookie sheet,” said Cerebras CEO Andrew Feldman. “Every single step of the way, we have to invent.”
Cerebras’ AI chips are already in commercial use: just last week, Argonne National Laboratory announced it is using Cerebras’ chip to help in the fight against coronavirus. Another startup taking a radical new approach to chip design is Bay Area-based Groq. In contrast to Cerebras, Groq’s chips are focused on inference rather than on model training. The founding team has world-class domain expertise: Groq’s team includes eight of the ten original members of Google’s TPU project, one of the first and to date most successful AI chip efforts.
Turning the industry’s conventional wisdom on its head, Groq is building a chip with a batch size of one, meaning that it processes data samples one at a time. This architecture enables virtually instantaneous inference (critical for time-sensitive applications like autonomous vehicles) while, according to the company, requiring no sacrifice in performance. Groq’s chip is largely software-defined, making it uniquely flexible and future-proof. The company recently announced that its chip achieved speeds of one quadrillion operations per second. If true, this would make it the fastest single-die chip in history.
Perhaps no company has a more mind-bending technology vision than Lightmatter. Founded by photonics experts, Boston-based Lightmatter is seeking to build an AI microprocessor powered not by electrical signals, but by beams of light. The company has raised $33M from GV, Spark Capital and Matrix Partners to pursue this vision. According to the company, the unique properties of light will enable its chip to outperform existing solutions by a factor of ten. There are a host of other players in this category worth keeping an eye on. Two Chinese-based companies, Horizon Robotics and Cambricon Technologies, have each raised more money at higher valuations than any other competitor. SambaNova Systems in Palo Alto is well-funded and pedigreed. While details about SambaNova’s plans remain sparse, its technology appears to be particularly well suited for natural language processing. Other noteworthy startups include Graphcore, Wave Computing, Blaize, Mythic and Kneron.
And several tech behemoths have launched their own internal efforts to develop purpose-built AI chips. The most mature of these programs is Google’s Tensor Processing Unit (TPU), mentioned above. Ahead of the technology curve as usual, Google started work on the TPU in 2015. More recently, Amazon announced the launch of its Inferentia AI chip to much fanfare in December 2019. Tesla, Facebook and Alibaba, among other technology giants, all have in-house AI chip programs.
The race is on to develop the hardware that will power the upcoming era of AI. More innovation is happening in the semiconductor industry today than at any time since Silicon Valley’s earliest days. Untold billions of dollars are in play. This next generation of chips will shape the contours and trajectory of the field of artificial intelligence in the years ahead. In the words of Yann LeCun: “Hardware capabilities….motivate and limit the types of ideas that AI researchers will imagine and will allow themselves to pursue. The tools at our disposal fashion our thoughts more than we care to admit.”