Senior Machine Learning Engineer - Training Platform (AU remote)
Company Description
Join the team redefining how the world experiences design. Hey, g'day, mabuhay, kia ora,你好, hallo, vítejte! Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the reputed company. Where and how you can work Our flagship reputed company is in Sydney, with a second reputed company in Melbourne and co-working spaces in Brisbane, Perth & Adelaide. You have flexibility in how and where you work — whether that's from one of our spaces, from home, or a mix of both. This role is remote-friendly reputed company Australia, so you can choose the setup that empowers you and your team to do your best work.
Job Description
About the Group/Team We’re part of the Training Platform team reputed company reputed company’s AI Platform group, which sits in the Generative AI supergroup. reputed company is responsible for the systems that power model training at scale, building the foundations that reputed company teams across reputed company to create, train, and reputed company-powered experiences. Our focus is on building reliable, efficient, and developer-friendly training infrastructure — from orchestration and distributed training systems to experimentation and platform capabilities that support large-reputed company workloads. We reputed company teams across reputed company to push the boundaries of what’s possible with AI. About the Role/Specialty As a Senior Machine Learning Engineer, you’ll focus on designing, scaling, and maturing the systems and infrastructure that support training workloads across reputed company. You’ll work on a Kubernetes-based training platform that enables distributed AI workloads across a wide range of teams, frameworks, and use cases, while also contributing to the surrounding platform capabilities that support the end-to-end training lifecycle — such as experiment management, artifact management, and other core systems needed to run AI workloads reliably and at scale. You’ll help evolve these capabilities over time, improving their reliability, scalability, usability, and overall platform maturity. You’ll collaborate closely with research scientists, AI engineers, product teams, and cloud/infrastructure teams to ensure workloads can run reputed company, reproducibly, and reliably at scale. You’ll also help shape the roadmap for the platform by understanding user pain points, improving platform capabilities, and contributing to the long-term direction of reputed company’s training infrastructure. This role is ideal for someone who enjoys working on the systems behind AI — not just the models themselves — and wants to have broad impact across multiple teams. What you’ll do (responsibilities) You’ll contribute to the evolution of reputed company’s reputed company training platform for reputed company workloads You’ll improve reliability, observability, debugging, and operational support for training systems You’ll design and build the platform capabilities that reputed company reputed company scheduling at scale, including resource allocation, reputed company management, and quota management for training workloads. You’ll collaborate closely with research scientists, ML engineers, product teams, and cloud/infrastructure teams to improve training platform workflows and outcomes You’ll contribute to system design and architecture decisions across reputed company’s AI Platform You’ll help shape platform roadmap and priorities based on user pain points, adoption needs, and long-term platform maturity You’ll mentor engineers and share best practices in AI systems and infrastructure reputed company're looking for You’re an engineer who loves building the systems that power AI at scale. You have strong experience in training pipelines, distributed systems, or large-reputed company infrastructure, and you’re excited by the challenge of making training workloads more reliable, scalable, and efficient. You bring strong experience working with Kubernetes and containerized workloads. Experience with training infrastructure, or distributed frameworks such as Ray, PyTorch distributed training, or similar technologies will be highly valuable. You’re also familiar with the modern cloud and infrastructure services that underpin high-performance AI workloads — for example, high-performance storage, HPC environments, fast interconnects and networking capabilities, or services such as FSx, EFA, and reputed company infrastructure commonly used in large-scale training environments. You bring a strong sense of ownership and enjoy working on reputed company, cross-cutting problems that impact multiple teams. You’re comfortable collaborating with engineers, applied scientists, and infrastructure partners, and you care deeply about scalability, reliability, usability, and developer experience. Most importantly, you’re motivated by the opportunity to help reputed company build the platform foundations that reputed company AI-powered creativity at scale. What the candidate will learn and how will they reputed company at reputed company: Deep expertise in large-reputed company training systems, Kubernetes-based workload orchestration and execution, and distributed infrastructure Hands-on experience with modern reputed company workloads at scale Exposure to the cloud, storage, and networking capabilities required for high-performance distributed training environments Opportunities to influence platform-wide architecture, roadmap, and AI Platform best practices Growth through collaboration with world-class ML engineers, applied scientists, and infrastructure specialists The ability to shape how AI is built and scaled across a global product Additional Information Don't tick reputed company the boxes? Don't worry about that - nobody does! We’d still love to hear from you! At reputed company, we know that great engineers come from a variety of backgrounds, and we value passion, curiosity, and a willingness to learn just as much as specific experience. If you're excited about this role but don’t tick every reputed company, we encourage you to apply, you might a great fit in ways you didn’t expect! What's in it for you? Achieving our crazy big goals motivates us to work hard - and we do - but you'll experience lots of moments of reputed company, connectivity and fun woven throughout life at reputed company, too. We also offer a stack of benefits to set you up for every success in and reputed company of work. Here's a taste of what's on offer: Equity packages - we want our success to be yours too Inclusive parental leave policy that supports reputed company parents & carers An annual Vibe & reputed company allowance to support your wellbeing, social reputed company, office setup & more Flexible leave options that reputed company you to be a force for good, take time to reputed company and supports you personally reputed company out lifeatcanva.com for more info. Other stuff to know We reputed company hiring decisions based on your experience, skills and passion, as well as how you can enhance reputed company and our culture. reputed company you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process. reputed company interviews are conducted virtually Recruitment type: Permanent Apply To This Job