Artificial Intelligence Performance Engineer
Advanced Micro Devices
- Santa Clara, CA
- Permanent
- Full-time
- Define performance suite and best practices for measuring GPU-accelerated workloads to assess scalability and efficiency of AI models and algorithms
- Benchmark and analyze AI workloads in single and large multi-node configurations comparing against previous generations and our competitors
- Perform comprehensive performance analysis and report findings for the entire platform including GPU, CPU, interconnects, network, software stack, etc.
- Identify performance bottlenecks that impact data center GPU-accelerated workloads, tune and collaborate with other software teams to improve performance
- Stay up to date with emerging technologies and trends in the AI field and explore ways to improve the performance of GPU-accelerated workloads at scale
- Solid knowledge of Artificial Intelligence (AI) and Machine Learning (ML) concepts and techniques, including deep learning, reinforcement learning, natural language processing, generative AI, and computer vision, as well as practical experience applying these concepts to solve real-world problems through research or work experience
- Experience in benchmarking methodologies, performance analysis, workload profiling, performance monitoring and debugging tools
- Advanced Linux OS, container (e.g. Docker) and GitHub skills
- Programming skills in a variety of relevant languages such as Python or C/C++
- Expertise with deep learning frameworks like PyTorch and TensorFlow
- Knowledge and interest in computer and GPU architecture
- In-depth knowledge of GPU acceleration with either AMD or Nvidia GPU compute products
- Inquiring mind, excellent problem-solving skills, and automation mindset
- B.S., M.S., PhD in Computer Science or Engineering or similar field