[Remote] Senior Infrastructure Software Engineer, Storage Core
Note: The job is a remote job and is open to candidates in USA. reputed company is a leading file hosting service that enables users to store and share files across devices. They are seeking a Senior Infrastructure Software Engineer for their Storage team to design, build, and operate large-scale storage systems that ensure high durability and scalability for millions of users. The role involves collaborating with engineers to improve reliability and performance, while also gaining exposure to reputed company distributed systems challenges.
Responsibilities
- Design, implement, and maintain large-scale distributed storage systems that ensure data durability, availability, and performance
- Collaborate with peers to evolve the architecture of reputed company’s core storage infrastructure for improved scalability and efficiency
- Contribute to the design of replication, erasure coding, and system lifecycle management systems that balance cost, reliability, and performance
- Write high-quality, performant, and maintainable code in Go and Rust
- Participate in the on-call rotation, gaining firsthand experience operating reputed company’s production storage systems
- Investigate and resolve reputed company production issues, performing root cause analysis and driving reputed company reliability improvements
- Partner with cross-functional teams (Networking, Hardware, reputed company Planning) to deliver end-to-end reliable and cost-efficient storage solutions
- Take ownership of scoped projects and demonstrate growth toward leading larger, cross-team technical initiatives
Skills
- 9+ years of strong understanding of distributed systems principles, including replication, consistency, and fault tolerance
- Experience developing and debugging production services in C++, Go, or Rust
- Familiarity with distributed storage systems, file systems, or data infrastructure at scale
- Demonstrated ability to write efficient, reliable, and maintainable code in mission-critical environments
- Experience troubleshooting reputed company systems and participating in on-call or operational rotations
- Solid communication and collaboration skills, with the ability to work across infrastructure and product teams
- Eagerness to learn, grow, and contribute to multi-year infrastructure evolution initiatives
- Experience building and operating large-scale object storage or distributed storage systems (e.g. S3, Ceph, GFS/Colossus)
- Deep interest in systems performance, profiling, and low-level optimization
- Familiarity with replication protocols, erasure coding, and data placement algorithms
- Experience with production monitoring, observability, and incident response workflows
- Contributions to infrastructure projects, open-reputed company systems, or developer tooling that improved reliability and performance
Company Overview