Accessibility Links

Senior Site Reliability Engineer

  • Job reference: 43959
  • Location: City of London
  • Job type: Contract
  • Start date: Not specified
  • Contact: Not specified
  • Sector: Interim Management
  • Salary: £0 per annum

Sorry, this vacancy has now expired.

Please see our job results page to find our current open vacancies or take a look at our Specialisms below and go straight to your area.

BI, Data & Analytics, Change Management, Digital and Development, ERP and CRM Systems, Executive Leadership, Information and Cyber Security, Infrastructure & Cloud Services, Interim Management, Strategy & Architecture, or Work For La Fosse.

 

 

 

 Return to homepage.

Role: Senior Site Reliability Engineer

Duration: 6 months

Location: London

Start date: ASAP

Our London client is currently looking for a Senior Site Reliability Engineer with strong AWS skills to join them on an initial 6 month contract. The Site Reliability Engineer will help design, build, deliver and run highly reliable and ultra-scalable software systems. The successful candidate is a key part of this project by supporting all the teams with their platforms, tools and processes needs.

Key Responsibilities:

  • Defines, promotes and implements best practices around Continuous Delivery pipelines, automated change/release processes, release strategies
  • Provides boilerplates to the engineering teams to speed up the build and release of new services
  • Works closely with the other SRE team members to provide self-service and self-healing platforms to the engineering teams, applying modern software engineering practices to our infrastructure (continuous delivery, automated tests, everything as code, on-demand provisioning, chaos engineering)
  • Spends time embedded in the other engineering teams to:

- Understand the customer and business needs

- Participate in every stage of the products / business capabilities engineering to promote scalability and reliability principles

- Help the teams define, measure and enforce their SLOs and error budgets

- Empower the teams from an infrastructure and a release points of view

- Liaise with the rest of the SRE team to share wider architecture or infrastructure needs to fulfil and refine the SRE backlog

  • Contributes to Site Reliability Operations (tickets, support, incident response, on-call rota, toil automation)
  • Collaborates with the wider organisation to produce blameless post-mortems to learn and share lessons from incidents

Essential Skills:

  • Hands-on experience with cloud-native software and infrastructure (AWS, GCP) in enterprise environment
  • Experience building, running and monitoring containers in production, as well as container orchestration at scale (Kubernetes, Nomad, …)
  • Infrastructure as Code (Terraform / CloudFormation, Ansible)
  • Hands-on experience with Python and Java
  • Well versed into security best practices, including dynamic secrets management (Hashicorp Vault)
  • Implementation of CI/CD pipelines for software and infrastructure
  • Familiarity with test strategies for distributed systems
  • Experience monitoring distributed systems (logging, metrics, distributed tracing, alerting)

If this role sounds of interest and you require further information then please respond with an up to date copy of your CV for consideration.

Related jobs