placeholder

The SRE minstrel, singing his way to reliable systems

author

Andreas Grabner

July 13, 2021

Bart Enkelaar, Lead Site Reliability Engineer at bol.com, inspires and educates the community about SRE and SLO through music.

While attending and presenting at SLOconf just a couple of weeks ago, I came across one of the most engaging and entertaining presentations I’ve ever seen at a conference: “Game of SLOs: A 3 part reliability musical” by Bart Enkelaar. In just under 10 minutes, he sings and plays his guitar with lyrics describing what SRE is and how the journey to it looks like.

With music being a big part of my life, I decided to reach out to Bart and invite him to our PurePerformance podcast to talk about SLOs and SRE. Luckily, he agreed, and I even convinced him to do a live rendition of his song!

Us having fun while recording our PurePerformance podcast singing songs about Site Reliability Engineering.

Here’s a summary of our discussion during the podcast.

If you’d like to listen to the full version, follow the link here: Making the case for SRE in a DevOps organization with Bart Enkelaar

Tell me a bit about yourself and how you got interested in SRE

I’m Bart Enkelaar, Lead Site Reliability Engineer at bol.com, the largest online retailing platform in Netherlands and Belgium. I’ve been a backend engineer since 2008 and joined bol.com in 2015.

Over time I became more interested in the operational side of things. Between 2016 and 2018, we did a DevOps transformation inside the company, and soon after we wrapped up the project, we noticed that our attention to operational standards was slowly declining. We started looking for another way to balance reliability with innovation, and that’s how we got to SRE. In 2018 we started moving to the google cloud. With SRE, we could solve the classical struggle between developing new features for the clients and improving the reliability of the technical side.

How did you get the idea of making a song about SRE?

I was inspired by my colleague Margaret, Chief Platform Officer at bol.com. She recently did a presentation on what diversity means to her: we should not split ourselves into two: a “work me” and a “free time me.” We are all whole people, and we should bring our whole selves to work. It’s what creates diversity in the workplace.

And I have been passionate about my work, but I also am passionate about making music with my friends. But I never brought this side of me to work. Then, when it came to finding ways to evangelize SRE and get people on board, I thought: Why not do this with music?

How does SRE look like at bol.com?

When starting an SRE project at your company, I think the most important thing you need to think about is what problems this solves in the company? It’s best not to think about overhauling every single process in your company; instead, start with some real-life examples and provide solutions on how they can be solved via SRE.

“You need to shift your mindset from “we need to overhaul everything in the company” to “these are concrete problems, and this is how we can solve them.”

We had just finished our DevOps transition, so we already had experience with automation in the department that handled Prometheus clusters and metrics tools. There we did not need SRE. However, we were going into a cloud transition, going live with critical services in the cloud, and we had nobody to support them. The DevOps team was managing five microservices, and they were only 2–3 people. The operations team was managing everything else. So, we concluded that an SRE team could manage this.

We built virtual teams of software engineers organized per value domain and providing reliability as a service. They enable other teams by providing the tooling and showing them how to use the metrics, get monitoring, etc. There is always one person on-call on rotation. At the same time, the team is facilitating the shift of all the products to using SLIs and SLOs. We still have a big challenge there!

After our insightful conversation on how SRE looks in real life, we finished our podcast with a live rendition of the song. Enjoy listening to it on YouTube: The SRE song.

Listen to the full podcast here: Making the case for SRE in a DevOps organization with Bart Enkelaar.

Thanks again, Bart, for joining us, and we look forward to seeing more musicals on SRE!


The SRE minstrel, singing his way to reliable systems was originally published in Dynatrace Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Written by

author

Andreas Grabner