You take over a new service and discover it has no monitoring. What monitoring would you put in place within the first week to ensure the service is working? Within the first month? How do you monitor failures which are local to a region?
Site Reliability Engineer Interview Questions
2,547 site reliability engineer interview questions shared by candidates
You will be asked to do live troubleshooting of an Apache (httpd) web service. You will not be given many details by the recruiter, so it's easy to study the wrong thing here. It ended up that you need to be familiar with the httpd config file and Aliases. You need to be familiar with how to change Linux filesystem permissions, but you can ignore that you are running on RedHat and you won't need to touch SELinux permissions. Be careful of one problem where they will have two nearly-identical file names, except one has a hypen and the other Unicode dash character. They look very similar in many fonts. Make sure you know how to do a simple GDB backtrace. You will be asked to debug a segfault and work around it (via simple file rename).
You will have to perform a code review of several pieces of code. Focus on logic errors, not stylistic issues. I don't remember all the code samples, but one was about doing file backups, where they manually implemented extension parsing and copied over ".1" files to ".2", etc. without ensuring the order of the copy.
Basic questions on recursion and division.
Parsing log file and printing counts of messages at given timestamps
System Design questions revolving around implementing a file server system.
Implement cli tool, that will work with some PagerDuty API
Would you set an alert on CPU usage? Why FreshBooks? Are you familiar with Kubernetes?
Take home assignment and multiple system design and culture fit use cases
Questions about Docker, Linux, Kubernetes, Cloud, etc.
Viewing 1481 - 1490 interview questions