top of page
DevOpsDays Zürich

Robert Lehmann

12 - 13 March 2025 | Alte Kaserne Winterthur

Staff SRE, Google

Robert has spent the past decade with Google SRE, building the tools that a majority of Google's incident response uses for production monitoring. In a previous life, he was a Python aficionado and otherwise enjoys teaching.

robert-lehmann.png

Talk

Planet-Scale Dashboards

Google runs hundreds of thousands of services globally, often interdependent and with shared concerns. At that scale, classical Federated Observability — a platform team providing foundations and/or building blocks for each team to assemble on their own — does not scale anymore.

In this talk, we will demonstrate how Google managed to cut toil dramatically while providing best-in-class monitoring out-of-the-box:

- What are the unique circumstances that contributed to Google’s scaling problem?
- A data model for re-usable dashboards
- Impact on both configuration overhead and incident response
- Looking beyond dashboards, how such re-use can be facilitated in the broader observability space

Robert will draw on Google’s research paper on Planet-Scale Dashboards (to be published mid 2025) and more than a decade of experience in SRE.

bottom of page