Datacenter GPU Platform Performance Engineer
Advanced Micro Devices
- Austin, TX
- Permanent
- Full-time
- Define performance suite and best practices for measuring GPU-accelerated workloads to assess scalability and efficiency of AI models and algorithms
- Benchmark and analyze AI workloads in single and multi-node configurations comparing against previous generations and our competitors
- Perform comprehensive performance analysis and report findings for the entire platform including GPU, CPU, interconnects, network, software stack, etc.
- Identify performance bottlenecks that impact data center GPU-accelerated workloads, tune and collaborate with other software teams to improve performance
- Stay up to date with emerging technologies and trends and explore ways to improve the performance of GPU-accelerated workloads at scale
- Solid knowledge of Artificial Intelligence (AI) and Machine Learning (ML) concepts and techniques, including deep learning, reinforcement learning, natural language processing, generative AI, and computer vision, as well as practical experience applying these concepts to solve real-world problems through research or work experience
- Experience in benchmarking methodologies, performance analysis, workload profiling, performance monitoring and debugging tools
- Advanced Linux OS, container (e.g. Docker) and GitHub skills
- Programming skills in a variety of relevant languages such as Python or C/C++
- Expertise with deep learning frameworks like PyTorch and TensorFlow
- Knowledge and interest in computer and GPU architecture
- In-depth knowledge of GPU acceleration with either AMD or Nvidia GPU compute products
- Inquiring mind, excellent problem-solving skills, and automation mindset
- B.S./M.S./PhD in Computer Science or Engineering or similar field