Site Reliability Engineer
Razer
- Bangsar South, Kuala Lumpur
- Permanent
- Full-time
- Design, implement, and maintain Infrastructure as Code (IaC) Collaborate with development and operations teams to ensure IaC best practices are followed.
- Participate in architecture reviews to provide insights into system reliability, platform management, capacity planning, and performance.
- Implement monitoring solutions to proactively identify and address performance bottlenecks and system failures.
- Collaborate with cross-functional teams, including development, operations, and security, to achieve common goals.
- Provide support and guidance to other teams on best practices for IaC and build processes.
- Oversee the incident response process, ensuring timely and effective resolution.
- Conduct post-incident reviews and implement improvements.
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
- Partner with development teams to improve services through rigorous testing and release procedures.
- Create sustainable systems and services through automation and uplifts.
- Balance feature development speed and reliability with well-defined service-level objectives
- Bachelor’s degree in computer science, Software Engineering, Information Technology or equivalent.
- 2 years of experience in software development, system administration, or a related field.
- Hands-on experience and knowledge in AWS cloud services (Lambda, SQS, RDS, ElastiCache, SES, EC2, ECS, AutoScaling, Microservices, Dockerization and Containerization, etc)
- Experienced in IaC (Terraform, CloudFormation). Able to plan, design and deploy IaC to new environment.
- Proficient in programming languages. Such as Python, Ruby, JSON, NodeJS will be advantage.
- Equip knowledge of environment involving application and webservers, databases, network load balancer, firewalls, DNS etc.
- Experienced in OS troubleshooting in both Windows and Linux platform.