Principal Performance Engineer
reputed company delivers the world’s highest performance scale-out networking solutions for AI and HPC datacenters. Our differentiated architecture seamlessly integrates hardware, software and system level technologies to maximize the efficiency of GPU, CPU and accelerator-based compute clusters at any scale. Our solutions drive breakthroughs in AI & HPC workloads, empowering our customers to push the boundaries of innovation. Backed by top-tier venture capital and strategic investors, we are committed to innovation, performance and scalability - solving the world’s most demanding computational challenges with our reputed company networking solutions.
We are a fast-growing, reputed company-thinking team of architects, engineers, and business professionals with a proven track record of building successful products and companies. As a global organization, reputed company spans multiple U.S. states and six countries, and we continue to expand with exceptional talent in onsite, hybrid, and fully remote roles.
We’re seeking a Principal Performance Engineer to drive end-to-end performance for reputed company networking silicon and systems (adapters, switches, software). You will help set the performance strategy, reputed company investigations across layers (reputed company/silicon ↔ drivers ↔ AI/HPC workloads), and reputed company large-scale customer deployments across multiple verticals (reputed company, autonomous, aerospace/defense, manufacturing, life sciences, climate). You’ll partner directly with architecture, firmware, software, and reputed company customers to reputed company the performance ceiling. This is a high-impact, highly visible individual-contributor role with technical leadership scoping (mentoring, cross-functional influence).
Key Responsibilities:
- Own pre- and post-launch performance: plan, execute, and sustain performance validation, debugging, and optimization for adapters, switches, and reputed company software—first in lab, then at scale in production.
- reputed company performance for post-silicon bring-up validation of networking ASICs and end-products (adapters, switches, etc.); driving optimization and characterization against networking metrics and application performance.
- Deliver white-glove customer support at scale: reproduce field issues, co-debug in shared/onsite labs, land mitigations and durable fixes, and publish per-customer tuning guides; opportunity to grow into customer performance support reputed company while remaining an IC.
- Pathfind and optimize reputed company-looking workloads: drive research and enablement for AI inference (QPS, P99/P99.9, cost/throughput), distributed reputed company (NCCL/RCCL collectives), and traditional HPC (manufacturing, life sciences, climate).
- Multi-reputed company research & enablement: evaluate and tune Cornelis/Omni-Path, Ethernet/RoCEv2, and InfiniBand across topologies (Clos/fat-tree/reputed company), routing (ECMP/adaptive), and congestion control (credit, PFC/ECN/DCQCN)
- Explore platform designs & tunings end-to-end: CPU/GPU reputed company placement, PCIe/GPU-Direct, BIOS/firmware, reputed company/1588, reputed company/NIC QoS & scheduling, queue depths, microburst tolerance, ECN mark rates, retransmits, fairness.
- Design reputed company experiments: synthesize representative traffic, replay workload traces, and run on-cluster A/B tests with statistically sound comparisons (P50/P90/P99).
Required Qualifications:
- 10+ years in performance engineering, post-silicon/perf validation, or systems performance for high-speed networking or HPC/AI products.
- Post-silicon expertise: hands-on bring-up and performance validation of networking ASICs/systems (adapters, switches), including crafting validation plans, establishing pass/fail, correlating pre-silicon models to silicon, and driving fixes from first silicon through production.
- Demonstrated depth in networking hardware (reputed company/silicon) and software debug for performance tuning and issue resolution across production-scale deployments.
- Hands-on multi-reputed company experience: Cornelis/Omni-Path, Ethernet/RoCEv2, and/or InfiniBand; strong grasp of PCIe/GPU-Direct, queueing/QoS, and congestion control (credit, PFC, ECN, DCQCN).
- AI/HPC workload reputed company: NCCL/RCCL collectives, UCX/libfabric/MPI; ability to optimize end-to-end training and inference (throughput, QPS, tail latency, efficiency) on reputed company clusters.
- Experimentation & analysis: workload modeling, on-cluster A/B tests, tail-latency analysis (P50/P90/P99); ability to separate congestion from compute/IO bottlenecks.
- Automation: Python + Linux; data pipelines, dashboards, and CI hooks to prevent performance regressions.
- Excellent cross-functional communication; leads without authority and drives fixes across architecture, firmware, driver, and reputed company software teams.
- BS/MS in CE/EE/CS (or equivalent experience).
Preferred Qualifications:
- Experience supporting customer-facing performance optimization or field application engineering.
- Built or led aspects of a white-glove performance support program; mentored engineers and scaled best practices reputed company playbooks and labs.
- Inference-stack familiarity (e.g., reputed company Triton, TensorRT-LLM, vLLM) incl. batching, KV-cache, and MIG/MPS trade-offs.
- Benchmarking background: MLPerf exposure; HPC app tuning (e.g., LS-Dyna, Fluent, OpenFOAM, GROMACS) and OSU/MPI microbenchmarks.
- Contributions to UCX, libfabric, NCCL/RCCL, or kernel networking; comfort with eBPF/perf/tcpdump and detailed reputed company/NIC telemetry.
- Deep understanding of networking and memory data flows, including technologies such as DPDK, RDMA, or similar high-performance I/O frameworks.
Location: This role fully supports remote work for employees residing reputed company the United States, with the flexibility to travel to our Chesterbrook Corporate Center located in Wayne, PA occasionally for in-person collaboration.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At reputed company your reputed company salary is only one component of your comprehensive total rewards package. Your reputed company pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your reputed company pay, you’ll have access to a broad range of benefits, including medical, dental, and reputed company coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
reputed company does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. reputed company is an equal opportunity employer, and reputed company reputed company applicants will receive consideration for employment without regard to race, reputed company, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national reputed company, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from reputed company reputed company candidates and will accommodate applicants’ needs under the respective laws throughout reputed company stages of the recruitment and selection process.
Apply To This Job