HPC Infrastructure Administrator Job at NVIDIA, Santa Clara, CA

QVlxRTdrSGVxcGh2KzA3ZHBPS1ZLUzl6b3c9PQ==
  • NVIDIA
  • Santa Clara, CA

Job Description

HPC Infrastructure Administrator Location Santa Clara, CA : We are now seeking a HPC Infrastructure Engineer! NVIDIA's Compute Architecture Group is growing our team of HPC Infrastructure Engineers who run our internal cluster for accelerated AI and HPC software development. As part of this team, you will help to manage a diverse cluster of GPU-accelerated systems. Your contributions will enable engineers to work efficiently with a wide variety of forward-looking hardware configurations as they vigilantly seek out opportunities for performance optimization and continuously deliver high quality software. Our ideal candidate is versatile enough to apply expertise from many domains: system administration, performance analysis, automation, and architecture. Your work will enable the ground breaking experimentation that allows us to design the world's most powerful systems for the most demanding computing applications. You will have a meaningful impact at a fast-moving company that is spearheading the next wave in computing technology. Join our technically diverse team of GPU architects, software engineers and infrastructure experts to unlock unprecedented performance in every domain! What you'll be doing:
  • Administer an HPC cluster composed of Linux systems ranging from the world's most powerful servers to embedded systems
  • Maintain the configuration of our resource management system (SLURM) to keep resource allocation efficient and aligned with organizational priorities
  • Automate configuration management, software updates, and maintenance of system availability using modern DevOps tools (Ansible, Gitlab, etc.)
  • Plan and maintain new systems that support the NVIDIA Software stack
  • Work directly with developers and hardware architects to debug issues, identify new requirements, and improve workflows
  • Actively communicate with users and management regarding resource planning and allocation
  • What we need to see:
    • 5+ years of previous experience deploying and administering HPC clusters
    • BA, BS, or MS in CS, EE, CE or equivalent experience
    • Deep knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.)
    • Demonstrated ability to script in bash, and at least one high-level language (Python preferred)
    • Experience with container technologies (Docker, Singularity, etc.)
    • Deep understanding of operating systems, computer networks, and high-performance hardware
    • Ability to work well with developers, hardware architects, & test engineers
    • Passionate dedication to providing quality support for users
    Ways to stand out from the crowd:
    • Prior work experience managing high performance fabrics and parallel file systems
    • Familiarity with CUDA and managing GPU-accelerated computing systems
    • Basic knowledge of deep learning frameworks and algorithms
    The base salary range is 118,400 USD - 224,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits . NVIDIA accepts applications on an ongoing basis. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Job Tags

Full time, Work experience placement,

Similar Jobs

Eastridge

Growth Marketing Specialist, LATAM Job at Eastridge

 ...Position Title: Growth Marketing Specialist - LATAM Location: Remote (U.S. based; ideally Central or Mountain Time, but flexible)...  ...Full-time Pay Rate: $36-$40/hour Language Requirement: Spanish fluency required About the Role An innovative and fast-paced... 

Texas Department of Transportation

Intern Engineering Support - North Tarrant Area Office Job at Texas Department of Transportation

 ...Conditions : Must be at least 16 years of age, a student currently enrolled in high school or any institution of higher education to include a trade...  .... Position Information TxDOT's Year-Round Internship Program provides current high school and college students... 

TEXAS AFT AMP

Part-Time Organizer Job at TEXAS AFT AMP

 ...American Federation of Teachers (AFT) is seeking reliable Part-Time Organizers. A Part-Time Organizer (PTO) provides and important function...  .... We alsorepresentemployees in universities, colleges, and community and junior colleges. Duties: Specific job duties... 

Assisting Hands Home Care Maywood

CHHA - 12 HOUR DAY SHIFTS AVAILABLE - EXCELLENT PAY Job at Assisting Hands Home Care Maywood

* 12 HOUR SHIFTS AVAILABLE 12 HourShifts Available!Apply today and start work right away!!! Assisting Hands Home Careworks with every...  ...YOU WILL LOVE WORKING WITH US: ~ Convenient Direct Deposit or Pay Card Options ~ PTO (Paid Time Off)~ Health, dental, and... 

Crown Equipment Corporation

Payroll Associate Job at Crown Equipment Corporation

 ...most efficient and ergonomic lift truck possible to lower their total cost of ownership. Job Posting External Job Duties Payroll Processing - Verify office, branch, and factory weekly timesheets. Input payroll data such as changes in hours, direct deposit...