Senior SRE Engineer
Moonpig
- Manchester London
- Permanent
- Full-time
- Optimising our cloud infrastructure to balance platform stability vs cost.
- Coding infrastructure automation using Terraform, deployed by CI/CD
- Supporting & patching our infrastructure as necessary to ensure we maintain high-levels of system security.
- Improving our Monitoring and logging stack so we can identify issues early
- Develop a relationship with our engineering teams, helping them define their SLAs and improve their overall system reliability
- Good engineering comes first - You'll have a great technical knowledge base and the experience to know what works and what doesn't. We expect you to apply these skills in making the right decisions and applying best practices wherever possible.
- Technical mentoring and leadership - You'll be collaborative, inclusive and spreading knowledge wherever possible. People will be looking up to you for technical guidance and part of your role will be to help them on that journey. You will also be responsible for creating the right forums to drive engineering principles and practices across all of engineering. You have the autonomy to drive decisions, but it's your responsibility to ensure everyone is involved.
- Culture and advocacy - You will be supporting a growth culture (e.g. running lunch & learns, brown bags, etc.) as well as advocating the organisation externally through meetups, blogging, hackathons etc. This is important to us as we are all in this together.
- Be part of a cross-functional team of SREs and Software Engineers implementing platform tooling and helping us maximise our uptime.
- Work in an environment that cares about operational concerns.
- Deliver value by jumping feet first into a wide range of problems to be solved. These could be internally facing within our technology organisation or for our external customers.
- Be on an on-call PagerDuty rotation to respond to platform availability issues and provide support for engineers with incidents.
- Be challenged to learn new skills and techniques; the range of problems you'll be solving means it's almost impossible not to learn something new.
- Work in a fun and social environment!
- Designing infrastructure using code with Terraform
- You have worked with highly available, high transactional websites and applications within a microservice architecture, clustered systems, automated deployments, disaster recovery and business continuity.
- Hands-on expertise in designing, analysing and troubleshooting large-scale distributed systems.
- Have a good understanding of network and operational security concepts
- Used AWS and its associated products, including API Gateway, Lambda, EC2, S3, VPC, CloudWatch and ALB.
- Monitoring production systems using industry standard tooling including Grafana and Opensearch
- Good communication skills; you are able to share status updates clearly and to ask timely and relevant questions when working with your peers.
- AWS, Serverless, Terraform, TypeScript, C#, .NET, GraphQL and React.
- GitHub for SCM and CI/CD
- Robust and performant cloud/serverless applications, with a focus on user experience and business growth.
- Full-stack, cross-functional teams, working closely with people of different specialisms within your team and across the business.
- Kanban
- Jira / Confluence
- Grafana and AWS Cloudwatch
- Google Analytics
- Clean Architecture
- TDD
- Pair Programming
- Focus on experimentation to validate our hypothesis