Production Reliability & Support Expert (SRE)

LanceSoft

  • Montreal, QC
  • Permanent
  • Full-time
  • 28 days ago
Job Title: Production Reliability & Support Expert (SRE)
Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week)
Years of experience : 3 to 5 years
Cyber Data Risk & Resilience– Identity & Access ManagementRole and Responsibilities:
  • Ensure Production Management is closely aligned/embedded in the Agile software development process and our code meets production standards
  • Incorporate System Reliability Engineering and DevOps implementations into the day-to-day role by developing automated solutions to long standing problems to ensuring minimal downtime and manual effort
  • Configuring application monitors using industry standard monitoring tools, as well as developing customized monitoring solutions
  • Build extensive business and application knowledge required for supporting client facing applications
  • Revisit SRE Metrics and confirm against the firm and department goals
  • Implement tooling / create automations to help with Toil Elimination (manual or repetitive work)
  • Engage early in SDLC with our Development teams to have an active role in creating a resilient and reliable solution
  • Prioritize project work based on critical incidents and key business stakeholders
  • Interface with clients and other technology teams to provide governance and control around the production environment.
Qualifications: You should apply on this requisition if you have, at minimum, the following profile:
  • Bachelor’s degree in Computer Science or related field
  • Experience with Service Oriented Architecture, Distributed Systems, Business Intelligence Reporting such as PowerBI, Scripting such as Python or shell, Front end development (HTML, Java Script, AngularJS), Cloud Computing such as MS AZURE and SaaS integrations
  • Clear understanding of Logging, Monitoring, and Knowledge Management practices such as Docs as Code
  • Ability to manage an incident call and coordinate multiple teams towards a common goal of resolving a business impactful outage, once trained
  • Strong knowledge of DevOps and SRE Principles with grasp over tools / approach to apply them
  • Strong infrastructure knowledge in Linux / Unix admin, Storage, Networking and Web Technologies
  • Advanced Unix Shell / Python scripting experience
  • Advanced SQL query language knowledge such as Sybase, DB2, MongoDB and Snowflake preferred.
*//EEO Employer:Minorities/ Females/ Disabled/ Veterans/ Gender Identity/ Sexual Orientation//*

LanceSoft