Facebook open sources its artificial intelligence server

Facebook today is announcing that its researchers have developed hardware for a type of artificial intelligence called deep learning, which can be used inside several of the company’s applications. Facebook is publishing the hardware designs for anyone to explore through the Open Compute Project.

The servers, codenamed Big Sur, are packed with graphics processing units (GPUs), which have become the chip of choice for deep learning. The technique involves training artificial neural networks on lots of data — pictures, for instance — and then getting them to make inferences about new data. Facebook is investing more and more into this field, so it makes sense for the company to design custom hardware, just as it has general-purpose servers, storage, and networking equipment. And it also makes sense to share the designs.

“This is a way of saying, ‘Look, here is what we use, here is what we need. If you make hardware better than this, we’ll probably buy it from you,'” said Yann LeCun, head of the Facebook Artificial Intelligence Research lab, during a conference call on the news. Facebook prominently hired LeCun in 2013.

Deep learning, a domain in which LeCun is highly regarded, can be used for speech recognition, image recognition, and even natural language processing. Facebook does all of those. It’s a core area for Facebook, just as it is for Google and Microsoft. Facebook has previously open sourced some of its AI software, and now the openness has extended to hardware.

Each Big Sur server can pack in as many as eight GPUs, each of which can max out at 300 watts. Facebook designed Big Sur based on Nvidia’s Tesla M40 GPU, but it can accommodate other GPUs as well.

Facebook has deployed these servers at its data centers both inside and outside the U.S., LeCun told reporters on the call.

Big Sur beats what Facebook was using before for deep learning.

“Leveraging NVIDIA’s Tesla Accelerated Computing Platform, Big Sur is twice as fast as our previous generation, which means we can train twice as fast and explore networks twice as large,” Facebook researchers Kevin Lee and Serkan Piantino wrote in a blog post. “And distributing training across eight GPUs allows us to scale the size and speed of our networks by another factor of two.”

Check out the full blog post for more detail on the Big Sur server.

In the war over deep learning that’s currently taking place in Silicon Valley, Facebook just dropped a bomb.

The social network is one of the Valley’s most invested when it comes to building out artificial intelligence technology to help its products think and act like humans. It’s a competitive endeavor — Google, IBM, Uber and Baidu are just a few of the companies racing Facebook to scoop up deep learning experts, the rare minds capable of building this type of software.

You’d think, then, that Facebook would keep its AI advancements under wraps and away from the competition. Not so, apparently.

The company announced Thursday that it built some new AI-specific servers — the physical hardware used to store all of the AI software its employees are creating — to do things like automate text conversations and understand what’s visible in a photograph. The new servers, called Big Sur, are twice as fast as the old ones Facebook used and hold twice as many graphics processing units — GPU chips are specific to hosting and preparing videos and images that are then seen on a screen (like your smartphone).

  • FAIR has achieved noted advancements in the development of AI training hardware considered to be among the best in the world.
  • We have done this through a combination of hardware expertise, partner relationships with vendors, and a significant strategic investment in AI research.
  • FAIR is more than tripling its investment in GPU hardware as we focus even more on research and enable other teams across the company to use neural networks in our products and services.
  • As part of our ongoing commitment to open source and open standards, we plan to contribute our innovations in GPU hardware to the Open Compute Project so others can benefit from them.

Although machine learning (ML) and artificial intelligence (AI) have been around for decades, most of the recent advances in these fields have been enabled by two trends: larger publicly available research data sets and the availability of more powerful computers — specifically ones powered by GPUs. Most of the major advances in these areas move forward in lockstep with our computational ability, as faster hardware and software allow us to explore deeper and more complex systems.

At Facebook, we've made great progress thus far with off-the-shelf infrastructure components and design. We've developed software that can read storiesanswer questions about scenes, play games and even learn unspecified tasks through observing some examples. But we realized that truly tackling these problems at scale would require us to design our own systems. Today, we're unveiling our next-generation GPU-based systems for training neural networks, which we've code-named “Big Sur.”

Faster, more versatile, and efficient neural network training

Big Sur is our newest Open Rack-compatible hardware designed for AI computing at a large scale. In collaboration with partners, we've built Big Sur to incorporate eight high-performance GPUs of up to 300 watts each, with the flexibility to configure between multiple PCI-e topologies. Leveraging NVIDIA's Tesla Accelerated Computing Platform, Big Sur is twice as fast as our previous generation, which means we can train twice as fast and explore networks twice as large. And distributing training across eight GPUs allows us to scale the size and speed of our networks by another factor of two.

Open Rack V2 compatible 8-GPU server
Open Rack V2 compatible 8-GPU server

In addition to the improved performance, Big Sur is far more versatile and efficient than the off-the-shelf solutions in our previous generation. While many high-performance computing systems require special cooling and other unique infrastructure to operate, we have optimized these new servers for thermal and power efficiency, allowing us to operate them even in our own free-air cooled, Open Compute standard data centers. Big Sur was built with the NVIDIA Tesla M40 in mind but is qualified to support a wide range of PCI-e cards. We also anticipate this will achieve efficiencies in production and manufacturing, meaning we'll get a lot more computational power per dollar we invest.

Servers can also require maintenance and hefty operational resources, so, like the other hardware in our data centers, Big Sur was designed around operational efficiency and serviceability. We've removed the components that don't get used very much, and components that fail relatively frequently — such as hard drives and DIMMs — can now be removed and replaced in a few seconds. Touch points for technicians are all Pantone 375 C green, the same touch-point color as all of Facebook’s custom data center hardware, which allows technicians to intuitively identify, access and remove parts. No special training or service guide is really needed. Even the motherboard can be removed within a minute, whereas on the original AI hardware platform it would take over an hour. In fact, Big Sur is almost entirely toolless — the CPU heat sinks are the only things you need a screwdriver for.

Collaboration through open source

We plan to open-source Big Sur and will submit the design materials to the Open Compute Project (OCP). Facebook has a culture of support for open source software and hardware, and FAIR has continued that commitment by open-sourcing our code and publishing our discoveries as academic papers freely available from open-access sites. We're very excited to add hardware designed for AI research and production to our list of contributions to the community.

We want to make it a lot easier for AI researchers to share techniques and technologies. As with all hardware systems that are released into the open, it's our hope that others will be able to work with us to improve it. We believe that this open collaboration helps foster innovation for future designs, putting us all one step closer to building complex AI systems that bring this kind of innovation to our users and, ultimately, help us build a more open and connected world.


Technology FAQs

You May Like