Middle Incident Site Reliability Engineer (SRE) IRC222117
GlobalLogic
- București
- Permanent
- Full-time
- Master’s degree in Computer Science or Engineering (IT, Telecom) preferred.
- 2+ years of experience in a Site Reliability Engineer (SRE) role or similar position.
- Advanced proficiency with Microsoft Azure services such as AKS, NSG, and Storage.
- Strong practical knowledge of Kubernetes, Helm, and FluxCD.
- Hands-on experience in creating and maintaining Terraform configurations.
- Demonstrated troubleshooting skills for distributed cloud-native applications using tools like kubectl, k8s, Lens Pro, and metrics within the ELK stack.
- Solid understanding of DevOps principles and proficiency in GitOps automation.
- Proficiency in Bash and Python programming languages.
- Good familiarity with GitLab CI/CD processes.
- Effective interpersonal communication skills in a highly collaborative team environment
- Advanced user level proficiency in Jira.
- Adhere to incident response protocols and facilitate collaboration among teams to address issues promptly.
- Contribute to 24/7 on-call rotations for comprehensive incident response coverage.
- Perform detailed post-incident assessments to ascertain underlying causes of issues.
- Document findings, propose preventive measures, and actively contribute to refining incident response protocols.
- Evaluate automated remediation tools for addressing known issues or recurring incident scenarios.
- Deploy automation solutions to minimize manual intervention during incident response procedures.
- Maintain effective communication with stakeholders, including technical teams, management, and customers, ensuring timely updates on incident status and resolution progress