Site Reliability Manager
Crown Agents Bank
- London
- Permanent
- Full-time
- Part of the Platform Operations team with heavy focus on platform and system operations in Production
- Work with the Architecture & Engineering, Product, Application Support, Service Management, Testing and Security teams to uphold good operational practices ensure that appropriate attention is given to production systems reliability from the point of view of our customers.
- Collaborate with Client Services, Application Support, Product, Engineering and Business Operations teams to ensure that for key services uptime, latency, response time and availability targets are met.
- Put focus on operations automation, system currency, and simplification to allow CAB to scale its portfolio of services sustainably.
- Practice and improve incident management processes and provide on-call support.
- Take ownership of complex Problem Records related to performance, reliability, and scalability and lead the coordination across technology to resolve them, lead the SRE team during major incidents.
- Build and maintain good understanding CAB products and the platforms on which they are implemented.
- Build and line manage the SRE team and ensure appropriate technology and skill coverage, manage on-call schedule
- 5+ years of platform operations engineering, SRE, DevOps or similar relevant experience in a B2B environment
- Experience and passion configuring and using enterprise grade application performance monitoring tooling such as Dynatrace, DataDog, Prometheus/Grafana, ELK etc.)
- Experience of deployment, configuration and migration to a cloud providers via IaC, ideally AWS and Terraform, good grasp on multi AZ/ Cross region resilience challenges
- Innovative and intuitive with a love of collaborative problem-solving.
- Demonstrable expertise in supporting large-scale, heterogeneous technology stack consisting of mixture of in-house developed monoliths, microservices and serverless functions, batch, external SaaS services, integration technology (MuleSoft Anypoint/ BizTalk)
- Experience in supporting production messaging and streaming technology such as Kafka or MQ
- Multiple years of AWS usage (management or hands on)
- Applies knowledge to tactical and strategic decisions
- Imparts technical knowledge to other team members
- Strong understanding of DevOps practices and highly available hosting design
- Strong Cloud computing interest with experience of IaaS, PaaS and SaaS on multi-cloud, IaC using Terraform and Ansible
- High level of understanding of Platform Security
- Have good knowledge of network infrastructure design & public cloud architecture.
- Hybrid working
- Contributory personal pension plan: - Minimum: Employee 2% and Employer 7%. Employer matches contributions in 1% increments to a maximum of: Employee 5% and Employer 10%
- Life Assurance – 4 times annual salary
- Group Income Protection
- Private Medical Insurance – this may include cover for partner and or children at company cost. Cover includes Optical, Dental and Audiology
- Discretionary Bonus
- Competitive Annual Leave
- 2 Volunteering Days
- Benefit Hut