Lead Site Reliability Engineer
JPMorgan Chase
- Bournemouth
- Permanent
- Full-time
- Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
- Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
- Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
- Documents and shares knowledge within your organization via internal forums and communities of practice
- Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems
- Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems
- Optimizes workloads for production and manages performance and observability for these workloads
- Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
- Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.)
- Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
- Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
- Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
- Drive to self-educate and evaluate new technology
- Ability to teach new programming languages to team members
- Ability to expand and collaborate across different levels and stakeholder groups
- Experience managing logging, metrics and traces in microservices based applications
- Experience adopting CI/CD practices and technologies for development projects
- Engage in coding, troubleshooting, and process automation
- Ability to anticipate, identify, and troubleshoot defects found during testing
- Experience in SRE or DevOps roles
- Understanding of and exposure to AWS Cloud Infrastructure
- Software Engineering experience with Agile team-based development following a structured lifecycle