DHCP infrastructure evolution at Facebook and the importance of designing stateless services

Facebook is one of the largest sites in the world, with multiple datacenters (and POPs in multiple continents) hosting a pretty large amount of machines. This talk is about the evolution of the DHCP production infrastructure at Facebook.

In the talk we will use the DHCP case as an example to discuss why it’s good to design your systems to be stateless and the fine line between leveraging OSS projects where possible and take a “Not Invented Here” approach instead. We will also talk about the challenges of driving large scope projects from remote offices and the importance of possessing skills in both systems and software development fields.

We’ll look at DHCP in Facebook in both IPv4 and IPv6 world, we will dive into old architecture and its limitations and then talk about how the Cluster Operations team in Dublin leveraged the ISC KEA open-source project to migrate from a stateful service to a stateless one discussing challenges faced in the process and the benefits we gained.

Audience: infrastructure people and people interested in datacenter networking.

Outline of the talk:
• Who am I?
• What does my team do?
• Production Engineering at Facebook and what it has in common with SRE
• Describe the old DHCP architecture and its problems
• Describe what were our goals for the new one
• ISC KEA, what it is and why we decided to use it
• Description of the new architecture and its advantages
• Discuss the importance of designing your services to be stateless