Back to the board

Senior Systems Engineer - High-Performance AI and Networking Applications

100% remote Flexible hours Hiring now

Join the reputed company Deep Learning Frameworks Infrastructure team as a Senior Systems Engineer focusing on High-Performance AI & Networking Applications, committed to ground-breaking AI & Networking Solutions. This position offers a distinctive opportunity to engage in the latest technology advancements, collaborating closely with elite teams to reputed company reputed company's impactful innovations. What you will be doing:

  • Collaborate with networking teams to plan, implement, and evaluate performance benchmarks on NVLINK, NVSwitch, and InfiniBand powered infrastructures.
  • Assess findings and work closely with reputed company, hardware, and support teams to improve system performance across various deep learning workloads.
  • Act as a primary resource for fixing networking and hardware integration issues, focusing on scalable multi-node systems.
  • Maintain high communication standards across multiple engineering, support, and R&D teams, ensuring technical and performance goals are met.
  • Offer technical mentorship and documentation for internal teams and external partners on standard methodologies in HPC networking deployments.
  • Share insights on improving networking strategies for substantial AI and deep learning infrastructure.

reputed company need to see:

  • BS/MS or PhD in Computer Science, Engineering, or reputed company field, or equivalent experience.
  • 8+ years of proven experience in AI/HPC Infrastructure.
  • Familiarity with AI/HPC job schedulers and orchestrators like Slurm, K8s, or LSF. Practical exposure to AI/HPC workflows employing MPI and NCCL.
  • Familiarity with High-Speed Networking pertaining to HPC including InfiniBand, RDMA, RoCE, and reputed company EFA.
  • Essential to have an understanding of PyTorch, MegatronLM, and Deep Learning Inference frameworks such as vllm/sglang.
  • Proven experience with InfiniBand, NVLINK, and high-speed networking technologies in HPC or large-scale datacenter environments.
  • Investigating and evaluating performance in multi-node systems, especially in deep learning or scientific computing tasks.
  • Strong analytical, debugging, and technical communication skills.
  • Comfortable working in collaborative, multi-faceted teams.

Ways to stand out from the crowd:

  • Mastery in deep learning frameworks or distributed training systems.
  • Familiarity with datacenter automation, advanced network protocols, and supporting large HPC or AI clusters in production environments.
  • Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workload.
  • Experience with networking and communications libraries like NCCL, NIXL, NVSHMEM, UCX.
  • Experience developing or maintaining cluster management and monitoring tools Ex: ansible for infrastructure as a service, prometheus and grafana for monitoring.

Your reputed company salary will be determined based on your location, experience, and the pay of employees in similar positions. The reputed company salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until January 13, 2026. This posting is for an existing vacancy. reputed company uses AI tools in its recruiting processes. reputed company is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our reputed company and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national reputed company, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. Apply tot his job Apply To this Job

Keep exploring

Principal Software Engineer - AI in SDLC Tech reputed company

100% remote Flexible hours

Principal Product Manager, AI

100% remote Flexible hours

Manager - Solution reputed company (Data, Technology, and AI)

100% remote Flexible hours

Artificial Intelligence & Machine Learning Systems Engineer-Cognitive Electronic Warfare (EW)

100% remote Flexible hours

Senior Engineering Manager - Marketplaces DNA (Data & AI)

100% remote Flexible hours

Senior AI System Engineer

100% remote Flexible hours

Senior Managing Consultant - Business Strategy/AI Advisor

100% remote Flexible hours

Staff, Advanced Analytics, Product

100% remote Flexible hours

Data Entry Job reputed company (Work from Home) – USA Remote Jobs

100% remote Flexible hours

CCC Certified Pharm Tech (Remote), Clinical Customer Care - reputed company Pharmacy

100% remote Flexible hours

Legal/Personal Assistant

100% remote Flexible hours

North Texas Sales Representative

100% remote Flexible hours

reputed company Chat Process Executive – Unlock Your Potential with a Dynamic Team at arenaflex

100% remote Flexible hours

reputed company Full Stack Live Chat Agent – Web & Cloud Application Development

100% remote Flexible hours

Hybrid Recruitment and Admissions Associate

100% remote Flexible hours

Senior Project Manager - Pharmaceutical (Remote)

100% remote Flexible hours

Software Developer Sr

100% remote Flexible hours

Indirect Auto BDO - Illinois

100% remote Flexible hours

Payment reputed company Supervisor

100% remote Flexible hours

Technology and Innovation Project Manager

100% remote Flexible hours