Site Reliability Manager
AKQA
- Auckland
- Permanent
- Full-time
- Design, implement and support platform reliability, scalability, and performance of our systems and applications of all phases of the SDLC
- Provide support during operations such as deployments and general production and non-production testing
- Provide technical support and manage tasks from multiple agile teams across the region
- Support the Delivery team with processes, frameworks and tool sets that champion environment health engineering
- Be on the 24/7 service desk roster supporting key clients across the region
- Perform root cause analysis for environment and application performance and uptime issues
- Contribute to both Incident Reports and Monthly Operational Reports
- Identify and perform analysis to provide recommendations for improvements
- Contribute to the planning and coordination of platform environment updates
- Contribute to the definition of DevSecOps best-practice and operational standards
- Collaborate with developers to ensure new environments meet client requirements and conform to defined standards and compliance
- Builds strong interpersonal relationships with key staff members across studios, clients, partners and teams
- Champions a vibrant and diverse engineering culture through internal presentations and knowledge transfer.
- 3-5 years' experience in a similar role
- Sound experience with Azure, AWS, DevOps, AWS CloudFormation, Terraform, Helm etc.
- Knowledge and hands on expertise in tools like New Relic, Dynatrace, Splunk etc.
- Exposure to GIT, Bitbucket, Jira/Confluence, New Relic and Cloudflare
- Experience in deploying resources using Python and Powershell scripting languages
- Self-motivated and willing to do what it needs to get the job done efficiently and effectively