SREcon19 Americas has ended
Back To Schedule
Monday, March 25 • 11:40am - 12:10pm
Lessons Learned in Black Box Monitoring 25,000 Endpoints and Proving the SRE Team's Value

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

How do you monitor systems that don't want to be monitored or ones that you don't have internal access to? Why monitor these systems at all? The United States Digital Service finds the truth and tells the truth, and fights fires across government, even when those fires don't want to be found. We put together a system to black box monitor all 25,000 .GOV domains and then expanded to perform more robust monitoring of important citizen-facing, government-provided services so we can go where the work is and restore services. In the process, we're hoping to change the culture and prove the value of SRE teams across government. This is how we're doing it.

avatar for Aaron Wieczorek

Aaron Wieczorek

United States Digital Service
Aaron is a Site Reliability Engineer at the United States Digital Service Headquarters team. He works on hard technical problems and hard bureaucratic problems, from infrastructure to CI/CD pipelines, to network engineering.

Monday March 25, 2019 11:40am - 12:10pm EDT
Grand Ballroom ABC