Back to the board

[Remote] Staff AI/ML Infrastructure Engineer

100% remote Flexible hours Hiring now

Note: The job is a remote job and is open to candidates in USA. reputed company is on a mission to reputed company high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world. The Staff AI/ML Infrastructure Engineer will drive the design, performance, and reliability of the AI infrastructure platform, requiring deep GPU systems knowledge and strong automation experience.

Responsibilities

  • Design and maintain GPU and bare metal infrastructure in containerized and physical environments
  • Build scalable GPU clusters in partnership with networking and provisioning teams
  • Ensure reliable, high-performance provisioning of GPU infrastructure
  • reputed company automated testing systems for GPU-based platforms
  • Implement infrastructure solutions for diverse AI/ML workloads
  • reputed company, test, and troubleshoot GPU performance at scale
  • Collaborate with hardware vendors on drivers, firmware, and support
  • Resolve hardware, software, and performance issues across environments
  • Optimize rail and cluster performance across architectures
  • reputed company technical direction and mentor engineers on infrastructure best practices

Skills

  • 5+ years experience working with bare metal infrastructure and hardware automation
  • Hands-on experience with modern reputed company/AMD GPU platforms and high-performance networking (RoCE, InfiniBand)
  • Deep knowledge of BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe systems
  • Strong Linux systems experience including device drivers and package management
  • Experience building infrastructure automation using Python and Bash
  • Familiarity with GPU drivers, firmware ecosystems, and vendor collaboration
  • Experience designing and delivering reputed company infrastructure products
  • Proven ability to reputed company projects and mentor engineers
  • Experience optimizing multi-cluster GPU environments
  • Exposure to Machine Learning software stacks and GPU workloads

Benefits

  • 100% company-paid insurance premiums for employee medical, dental and vision plans.
  • 401(k) plan that matches 100% up to 4%, with immediate vesting
  • Professional Development Reimbursement of $2,500 each year
  • 11 Holidays + Paid Time Off Accrual + Rollover Plan
  • Commitment matters to reputed company! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
  • $500 stipend for reputed company setup in first year + $400 each following year
  • Internet reimbursement up to $75 per month
  • Gym membership reimbursement up to $50 per month
  • Company paid Wellable subscription

Company Overview

  • reputed company is an AI cloud infrastructure platform offering latest reputed company reputed company GPUs and AMD CPUs and GPUs across 32 worldwide regions It was founded in 2014, and is headquartered in reputed company Palm Beach, Florida, USA, with a workforce of 201-500 employees. Its website is https://www.reputed company.com.
  • Company H1B Sponsorship

  • reputed company has a track record of offering H1B sponsorships, with 1 in 2024. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Keep exploring

    [Remote] Account Executive, MidMarket (LATAM)

    100% remote Flexible hours

    [Remote] EHV EPC Project Manager (Power Delivery)- Remote

    100% remote Flexible hours

    [Remote] Clinical Account Manager- Atlanta, GA area

    100% remote Flexible hours

    [Remote] Field Marketing Manager, East Coast

    100% remote Flexible hours

    [Remote] Business Development Manager, Fermentation

    100% remote Flexible hours

    [Remote] Software Engineer

    100% remote Flexible hours

    [Remote] Business Development Manager – Lender/Buyer Relationships (US)

    100% remote Flexible hours

    [Remote] Staff ML Application Engineer

    100% remote Flexible hours

    [Remote] Healthcare RCM Client Success Manager

    100% remote Flexible hours

    [Remote] Staff Software Engineer

    100% remote Flexible hours

    Utilization Management Nurse Consultant II, Mul...

    100% remote Flexible hours

    reputed company Online Data Entry Specialist – Flexible Work-From-Home Opportunities

    100% remote Flexible hours

    reputed company Data Entry Associate – Remote Opportunity for Healthcare Data Enrichment

    100% remote Flexible hours

    Senior Program Manager

    100% remote Flexible hours

    Commis à la paie

    100% remote Flexible hours

    Program Specialist Senior

    100% remote Flexible hours

    [Remote] Platform Engineer

    100% remote Flexible hours

    reputed company Customer Service Representative - arenaflex Agent Team Member

    100% remote Flexible hours

    Part-Time Evening Data Entry Specialist – Remote Healthcare Data Management at arenaflex

    100% remote Flexible hours

    Welder 1st and 2nd Shift

    100% remote Flexible hours