Senior Kubernetes Admin / Systems Engineer, EngProd
Arista Networks
- Vancouver, BC
- Permanent
- Full-time
- Work with existing k8s admin team to own different aspects of managing a production k8s cluster (eg: upgrades, monitoring, capacity planning, security, developer experience etc)
- Proactively monitor, respond to, and enhance alerts and set up automated alert handling where applicable
- Create and maintain the incident response runbooks working with the service dev teams
- Debug and resolve issues impacting developer user experience and infrastructure stability around the k8s platform
- Adopt current best practices in k8s cluster management. Evaluate and adopt OSS projects that simplify k8s cluster management.
- Set up guidelines and paved paths for service dev teams improving developer experience around the k8s platform.
- Work with Arista’s software engineers to identify bottlenecks and limitations in our workflows, tooling, and infrastructure around k8s and provide fixes for those problems.
- Engage with 3rd party vendor support as part of triage
- At least BSc Computer Science or Engineering + 7 years’ experience, MS Computer Science or Engineering + 5 years’ experience, or Ph.D. in Computer Science or equivalent work experience.
- Knowledge of one or more of Go, Python, Javascript. Experience with shell Scripting to be able to implement medium complexity automation workflows.
- Knowledge of Linux (or UNIX).
- Experience in operating software systems at scale.
- Strong understanding of the fundamentals of storage and networking.
- Comfortable with Ansible and GitOps.
- Strong expertise with managing on-prem/baremetal Kubernetes clusters.
- Applied understanding of software engineering principles.
- Strong problem solving and software troubleshooting skills.
- Ability to design a solution and implement features independently. Ability to work in small teams.
- Comfortable with security principles and able to study source code of OSS projects, conduct experiments as necessary to debug issues.
- Proven expertise with debugging complex issues that span the technology stack.
- Experience dealing with network proxies and containerized storage.