Site Reliability Engineer III
Cabify
- Madrid
- 52.000 € al año
- Permanente
- Tiempo completo
- Evolving our infrastructure platform building self-service components that will be used by all the engineering team and by millions of users around the world.
- Working closely with our Product and Infrastructure teams to architecture and develop world-class infrastructure components.
- Designing and implementing tooling to improve the availability, scalability, observability and latency of our services, which are used by internal customers to deploy and operate their services.
- Increasing reliability awareness with other teams, helping with the adoption of reliability principles and reviewing observability implementations or software architectures.
- Defining SLIs, SLOs and SLAs as part of the services' lifecycle.
- Sharing an on-call schedule for the platform services you own.
- Solving problems in our highly available platform together with other teams, then build automations to prevent incidents from happening again.
- Participating in our recruiting process to help grow our engineering team.
- Think Unix, you know the networking stack, the OSI model, containers (and schedulers), and you know your way around monitoring, logging and the CAP theorem (bonus!).
- Have strong programming skills in at least one language, and know your way around a few more or can learn them if the opportunity arises.
- Automate yourself out of everything by nature, making machines do the toil.
- Communicate effectively and asynchronously.
- Care about the things that affect the company, your team, and yourself.
- Embrace diversity and humbleness (and a bit of trolling).
- Prefer taking iterative action over waiting for things to happen or to be perfect.
- Strongly favor simplicity over complexity. Ie, adhering to the KISS principle.
- Have a sense for identifying, exploiting and elevating bottlenecks.
- Are not afraid of expressing yourself in English. We aren't expecting you to have the Queen's accent, but you'll be part of an international team and we communicate in English, so you should be comfortable with that.
- Enjoy herding cats and shaving yaks. Ie, being a great influence to other product teams and teaching them best practices. As well as analyzing and simplifying our setup.
- Helping us iterate on and improve our kubernetes setup (AWS EKS).
- Iterate our networking layer to implement network policies, service mesh, and more…
- Evolving our time-series monitoring platform (Cortex), in order to provide a first-class service to all of our engineering teams.
- Help grow our adoption of distributed tracing (OTLP + Tempo), with the goal of providing request latency observability across microservices (as a service).
- Scaling our ever-growing logging platform (Loki) to keep up with the business demands.
- Maintaining our company-wide code repository and continuous integration solution (gitlab)