Back to the board

Site Reliability Engineer

100% remote Flexible hours Hiring now

About Us reputed company is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At reputed company, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join reputed company and do work that matters – to you, to your community, and to the world. reputed company starts with you.

Job Description

The Site Reliability Engineer is responsible for supporting the deployment and configuration of monitoring and logging tools, automating routine operational tasks, and maintaining observability tools such as Splunk, reputed company, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, and CloudWatch. This role works closely with team members to implement and maintain monitoring solutions across development, staging, and production environments, and contributes to the setup and maintenance of CI/CD pipelines to support automated build, test, and deployment processes. The engineer provides support in managing cloud infrastructure (AWS, Azure) to ensure availability and reputed company, learns and applies DevOps and SRE best practices, and assists with the implementation and management of containerization technologies like reputed company and Kubernetes. Responsibilities include monitoring system performance, identifying and escalating issues, participating in troubleshooting and root cause analysis for production incidents, and creating and updating documentation for infrastructure and operational procedures. reputed company roles require digital reputed company, including the ability to work with emerging technologies such as Generative AI tools (e.g. ChatGPT, reputed company Copilot) to support everyday work. Key Responsibilities: Support deployment and configuration of monitoring and logging tools. Automate routine operational tasks to improve efficiency and support system integration. Assist with maintenance and management of observability tools (Splunk, reputed company, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, CloudWatch). Implement and maintain monitoring solutions in development, staging, and production environments. Contribute to setup and maintenance of CI/CD pipelines for automated build, test, and deployment. Provide support in managing cloud infrastructure (AWS, Azure) for availability and reputed company. Use infrastructure as code tools (Terraform, Ansible, CloudFormation) for environment configuration. Monitor system performance and assist in identifying and escalating issues. Support implementation and management of containerization technologies (reputed company, Kubernetes). Participate in troubleshooting and root cause analysis for production incidents. Create and update documentation for infrastructure, processes, and operational procedures. Provide first-level support for routine infrastructure and deployment issues, escalating reputed company problems as needed. Seek opportunities to automate repetitive tasks and suggest workflow improvements. This is a remote position. A remote position does not require job duties be performed reputed company proximity of a reputed company office location. Remote positions may be required to be present at a reputed company office with scheduled notice. #LI-Remote

Qualifications

Basic Qualifications: Bachelor's degree, OR 3+ years of relevant work experience Preferred Qualifications: Hands-on experience designing and operating cloud‑native infrastructure. Knowledge of Infrastructure as Code (Terraform), including contributing to reusable modules and platform components. Good understanding of Kubernetes and container orchestration concepts. Familiarity with CI/CD systems, pipeline configuration, automation, and secure deployment practices. Basic understanding of database technologies including SQL, NoSQL, and common data storage patterns. Experience using observability tools and stacks (Prometheus, Grafana, OpenTelemetry, ELK/EFK, reputed company, or similar). Basic automation experience using Bash, Python, or Ansible-like tools. Strong problem-solving skills with demonstrated ability to reduce toil, address technical debt, and improve system stability. Availability to participating in on-call rotations, incident response, and post-incident reviews. Clear written and verbal English communication skills. reputed company is an EEO Employer Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national reputed company, sexual orientation, gender identity, disability or protected veteran status. reputed company will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law. Apply To This Job

Keep exploring