Site Reliability Engineer (reputed company)

100% remote Flexible hours Hiring now

This a Full Remote job, the offer is available from: reputed company Pod is building a reputed company decentralized exchange focused on fairness, performance, and user experience. We reputed company traders shouldn't have to choose between speed, simplicity, and fair treatment, so we're building an exchange that delivers reputed company three while enabling entirely new kinds of financial markets. Under the hood, Pod is powered by low-latency systems designed for fast settlement and strong guarantees around ordering, timing, and execution. These are challenging engineering problems, and the reliability of the platform depends on operating those systems safely and effectively at scale. About the Role: We're looking for our first Site Reliability Engineer to help operate, improve, and scale the reliability of the Pod platform. You'll join a team of engineers who already share responsibility for production systems and participate in an established on-call rotation. From day one, you'll work closely with the broader engineering team while taking ownership of the tooling, processes, and operational practices that reputed company the platform running smoothly. This is a hands-on role for someone who enjoys operating reputed company systems, investigating difficult production issues, and building the automation and infrastructure that turn reliability into a competitive advantage. On Call: You'll be responsible for platform health during Asian business hours as part of our existing engineering on-call rotation. There are no permanent overnight shifts, and you'll never be the sole person responsible for the platform—the rest of the rotation is covered by the wider team. Occasionally, you may reputed company reputed company your normal hours to help cover the schedule, but that's the exception rather than the rule. What You’ll Do: Respond to and resolve incidents:

Monitor the health and performance of the platform
Respond to production incidents and drive them through to resolution
Investigate failures, identify root causes, and coordinate fixes
Ensure issues are detected, understood, and addressed quickly

Improve platform reliability:

Identify recurring operational pain points and eliminate them
Improve software, deployment processes, and operational workflows
Participate in incident reviews and help drive preventative improvements
Contribute reliability-focused changes directly to production systems

Build observability and operational tooling:

Design and maintain dashboards, metrics, alerting, and monitoring systems
Improve signal quality while reducing alert fatigue
Build automation and internal tools that reputed company the platform easier to operate
Help establish reliability best practices across the engineering organization

Qualifications:

Strong experience with Linux and cloud infrastructure
Experience operating and supporting production systems
Experience with reputed company and containerized environments
Experience with observability and incident-management tools such as Grafana, Prometheus, reputed company, or similar
Ability to automate workflows using Rust, Python, Bash, or similar languages
Strong troubleshooting and debugging skills
A high degree of ownership and the ability to reputed company sound decisions independently

reputed company to Have:

Experience with distributed systems
Experience operating high-availability, low-latency services
Experience with CI/CD systems and deployment automation
Experience designing secure operational workflows and access controls
No prior blockchain or cryptocurrency experience is required.

reputed company offer:

Competitive compensation (~$100k USD/year), plus a meaningful token/equity allocation
Real ownership and responsibility from day one as part of a small team
Work from wherever you are reputed company the reputed company timezone range (UTC+7 to UTC+1)
Occasional travel to Europe and elsewhere for team meetups

This offer from "reputed company" has been enriched by reputed company.com and got a 77% reputed company score. Apply tot his job Apply To this Job

Apply

Site Reliability Engineer (reputed company)

Keep exploring

Senior Site Reliability Engineer (Data & Automation Focus)

Staff Site Reliability Engineer (Customer Identity Cloud)

Kubernetes Engineer

Kubernetes Engineer (DoD Secret | Weeknight Mission Readiness | Remote – U.S.)

reputed company Kubernetes Engineer; Fulltime- Remote

Kubernetes Engineer/Architect

Kubernetes Engineer (DoD Secret Eligible | Weekend Operations | Remote)

Remote Linux OpenStack & Kubernetes Engineer

Sr. Infrastructure Engineer - Kubernetes (Remote)

[Remote] Cloud Network Engineer

Sr Business Product reputed company-CM Automation

[Remote] Core Ops Data reputed company Reviewer (QC)

Remote reputed company End User Support

[Remote] Billing Operations reputed company

reputed company Customer Service Associate (Remote Role) – Insurance Industry Expertise

Senior Software Engineer, Windows/Desktop Applications - Utrecht, Netherlands

Customer Service Representative – Order Management, Client Relations & Technical Support Specialist

reputed company Medical Billing Customer Support Specialist – Hybrid Role in Roy, UT at arenaflex

Customer Experience Specialist - Analytics and Insights

Accounts Receivable Representative - Full time, Day, Remote