Searching for a Senior SRE / Site Reliability Engineer to join a leading market research organisation reporting directly into the Head of SRE. I am looking for a consultant who has specific experience in developing SRE principals within a team and has knowledge of containers, scaling, distributed systems, networking, cloud native and many more. The ideal profile should have come from a software development background and understand DevOps ways of working.
- Partner with Product Owners to agree availability targets that our customers value the most.
- Document every action so your findings turn into repeatable actions-and then into automation.
- Make monitoring and alerting alert on symptoms and not on outages.
- Debug production issues across services and all levels of the stack including real user experience issues.
- Be on a standby rotation to respond to availability incidents and provide support for Product teams with customer incidents.
- Learn from your time on-call to prevent incidents from ever happening.
- Run our Infrastructure Platform with Terraform and Kubernetes.
- Use the Infrastructure Platform to run your product as a first resort and make suggestions to improve the platform as much as possible.
- Improve the deployment process to make it as boring as possible.
- Plan the growth of your product's infrastructure.
- Design, build and maintain paved road modules that allows products to scale.
- Infrastructure as Code - IaC
- Kubernetes - EKS
Please get in touch with Sonny Allenby at La Fosse for further details.