Solutions Architect - Cloud and MLOps solutions

Nebius

  • Ottawa, ON
  • Permanent
  • Full-time
  • 2 months ago
The companyNebius AI is an AI-centric public cloud platform specifically crafted to serve AI models for training and inference.Our mission is to help ML practitioners concentrate on their core jobs, while DevOps, MLOps, and infrastructure-related tasks are handled by us. The idea is to build an ML-specific cloud platform covering the entire ML lifecycle from A to Z: from data preparation and labeling to ML training and inference.We recognize the potential of ML and AI technologies and aim to provide our future users with the perfect environment to train and fine-tune their models. We are committed to delivering the best user experience and excellent customer support.Four development hubs:
Nebius is headquartered in the Netherlands, with hubs in Finland, Serbia, and Israel.Data center in Europe:
Our own data center in Finland features server racks designed in-house for ML-specific high load, with power-efficient solutions, including a free-cooling system.500+ professionals:
Our mature team of engineers has a proven track record in developing sophisticated cloud and ML solutions and designing cutting-edge hardware.The roleWe are seeking a highly skilled and customer-focused professional to join our team as a Solutions Architect specializing in Cloud and MLOps. As a Solutions Architect, you will play a pivotal role in designing and implementing cutting-edge solutions for our clients, leveraging cloud technologies for ML/AI teams and becoming a trusted technical advisor for building their pipelines.You’re welcome to work remotely from Canada.In this position, your responsibility will be to:
  • Act as a trusted advisor to our clients, providing technical expertise and guidance throughout the engagement. Conduct PoC, workshops, presentations, and training sessions to educate clients on GPU cloud technologies and best practices.
  • Collaborate with clients to understand their business requirements and develop solution architecture that align with their needs: design and document Infrastructure as code solutions, documentation and technical how-tos in collaboration with support engineers and technical writers.
  • Help customers to optimize pipeline performance and scalability to ensure efficient utilization of cloud resources and services powered by Nebius AI.
  • Act as a single point of expertise of customer scenarios for product, technical support, marketing teams.
We expect you to have:
  • 5+ years of experience as a cloud solutions architect, system/network engineer, developer or a similar technical role with a focus on cloud computing
  • Strong hands-on experience with IaC and configuration management tools (preferably Terraform/Asible), Kubernetes, skills of writing code in Python
  • Solid understanding of GPU computing practices for ML training and inference workloads, GPU software stack components, including drivers, libraries (e.g. CUDA, OpenCL)
  • Excellent communication skills
  • Customer-centric mindset
  • Fluent English
It would be an added bonus if you had:
  • Hands-on experience with HPC/ML orchestration frameworks (e.g. Slurm, Kubeflow)
  • Hands-on experience with deep learning frameworks (e.g. TensorFlow, PyTorch)
  • Solid understanding of cloud ML tools landscape from industry leaders (NVIDIA, AWS, Azure, Google)
Does all that sound like your kind of challenge? Then join us!

Nebius