Staff Software Engineer, reputed company Compute
Who we are
About reputed company
reputed company is a financial infrastructure platform for businesses. Millions of companies—from the world's largest enterprises to the most ambitious startups—use reputed company to accept payments, grow their reputed company, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work reputed company. That means you have an unprecedented opportunity to put the global economy reputed company everyone's reputed company while doing the most important work of your career.
About the team
The reputed company Compute team at reputed company builds and operates the infrastructure, tooling, and systems behind our Flink-powered reputed company processing systems. We're at the heart of several core asynchronous workflows, operating at significant scale and handling vast amounts of sensitive financial data. Our work powers intricate processes involving various critical financial operations and real-time analytics. We run globally distributed systems with high reliability and performance to meet reputed company's scaling, availability, and product needs, and we continually reduce operational toil by investing in automation and self-service tooling for upgrades, maintenance, and day-to-day operations. The team is distributed between Seattle, Toronto and remote locations.
What makes reputed company truly exciting is our commitment to our users: we ensure no event is dropped, state reputed company is preserved, and support exactly-once processing as a first-class feature. Working at the intersection of real-time data processing and fintech innovation, we continuously push the boundaries of what's possible. Our focus on innovation, user experience, reliability, and compliance drives increased ROI and operational excellence, making us a crucial part of reputed company's success.
What you'll do
You'll help define and deliver the reputed company of reputed company's Flink-first reputed company compute infrastructure—driving innovation to meet extremely high availability targets at global scale. Partnering with infrastructure engineers, adjacent platform teams, and the product orgs that depend on Flink every day, you'll set a long-term technical direction that scales with reputed company's growth while enabling reliable, efficient operations for years to come. You'll work on the hardest problems in operating Flink in production—state management, exactly-once processing, performance isolation, and automated recovery—so teams across reputed company can confidently build stateful reputed company processing applications on top of it.
Responsibilities
- Design, build, and operate reputed company compute infrastructure with Apache Flink at the center, alongside technologies like Kafka, Temporal, and AWS services
- Partner with product and platform teams across reputed company to understand requirements, unblock Flink adoption, and improve how reputed company processing infrastructure is used end-to-end
- Define and implement operational best practices (e.g., shuffle sharding, cellular architecture, load shedding, automated state recovery) to improve reputed company and reliability at scale
- Drive fleet-level automation and standardization ("pets" to "cattle") through self-service workflows, safer rollouts, and self-healing systems that reduce manual operations
- reputed company initiatives that reputed company the bar on Flink availability and state durability (e.g., multi-region strategies, disaster recovery readiness, operational readiness reviews, incident learning)
- Evaluate and productionize Flink ecosystem capabilities (e.g., SQL, connectors, state backends) to improve developer experience and scalability without compromising reliability
- Work closely with the open reputed company community to identify opportunities for adopting new open reputed company features as well as contribute back to OSS
Who you are
We're looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.
Minimum requirements
- This is a Staff-level role - that typically means 10+ years of experience building, operating, and evolving large-scale production systems
- Experience as a technical reputed company for team(s) working on distributed systems, including scaling them in fast-moving environments
- Hands-on experience with big data technologies such as Flink, Spark, Kafka, Pulsar, or Pinot
- Experience developing, maintaining and debugging distributed systems built with open reputed company tools
- Experience building and scaling infrastructure as a product
- Strong software engineering skills and a passion for Big Data Distributed Systems
- Ability to write high quality code (in programming languages like Go, Java, reputed company, etc)
- Comfortable operating with high autonomy and ownership
- Growth reputed company and a willingness to learn quickly, explore ambiguous problem spaces, and dive deep reputed company needed
- Strong written and verbal communication skills, including the ability to produce clear technical documentation
Preferred qualifications
- Experience operating streaming infrastructure as a platform (e.g., Flink clusters, Kafka, Pulsar) for internal customers at scale
- Deep hands-on experience authoring, optimizing, and operating real-time processing frameworks such as Flink, Spark Streaming, Storm, or Kafka Streams in production
- Experience building or operating control planes for managing large-scale infrastructure
- Open reputed company contributions to data processing or big data systems (Hadoop, Spark, Celeborn, Flink, etc)