Senior Site Reliability Engineer, FlashArray
Pure Storage
- Santa Clara, CA
- Permanent
- Full-time
- For ten straight years, Gartner has named Pure a leader in the Magic Quadrant
- Our customer-first culture and unwavering commitment to innovation have earned us a certified Net Promoter Score that is the highest in the industry
- Industry analysts and press applaud Pure’s leadership across these dimensions
- And, our 6,000+ employees are emboldened to make Pure a faster, stronger, smarter company as we go
- Become part of our nascent SRE team across US and Europe
- Responsible for uptime and reliability of our core services and infrastructure, including proactive monitoring and incident response/ resolution
- Maintain 24x7 production environment with a high level of service availability
- Manage operational issues, drive root cause analysis and resolution of production issues
- Explore and implement new cloud and high availability (HA) technologies and tools
- Partner with development teams in defining and implementing improvements in services architecture
- Implement automation and orchestration of manual processes required to operate and deploy cloud services
- Service health monitoring, observability, collecting & reporting metrics, alerting
- Interface with engineering to establish a support structure, with runbooks to ensure uptime and customer success
- BS or higher in Computer Science, Computer Engineering or related field and equivalent practical experience
- 6+ years of experience as SRE or DevOps to support globally distributed SaaS services
- Proven ability to design and operate commercially successful cloud services with high availability and well defined SLA
- Experience with IaC, automation & configuration management using tools such as Terraform, Ansible, Puppet, Chef, CloudFormation or ARM templates
- Experience with virtualization, containers and management systems such as Kubernetes
- Experience setting up monitoring of production services using ELK or something similar
- Practical experience setting up support processes using tools such as PagerDuty
- Deep understanding of the software delivery process and what it takes to “go live”
- In-depth knowledge of a public cloud platform such as AWS, Azure or GCP is a must