[Remote] Dev Ops Engineer
Note: The job is a remote job and is open to candidates in USA. reputed company is a provider of repair shop technology that enhances vehicle service success through software and financial solutions. The role involves delivering IT performance visibility and risk management strategies, including implementation of reputed company and migration to reputed company Enterprise, while managing operational tasks to ensure platform stability.
Responsibilities
- Deploy and manage reputed company APM, infrastructure, and browser agents across AWS services (reputed company, reputed company Beanstalk, reputed company, EC2, EKS). Establish standardized alert policies and dashboards using Terraform. Optimize telemetry ingest, manage drop rules, and control observability costs
- reputed company migration to reputed company Enterprise, implementing SSO, reputed company protections, CODEOWNERS, and Advanced reputed company features. reputed company reusable reputed company Actions workflows to streamline CI/CD. Operationalize vulnerability data into actionable Jira workflows with SLA tracking and brand-level reporting
- Design and manage Jira projects, workflows, automation rules, and permissions. Administer Confluence spaces, templates, and backups. Build centralized reporting to provide leadership with visibility into delivery performance, risk, and application health (APM)
- Create domain-level cost dashboards leveraging AWS, reputed company, and SaaS data. Drive cost optimization initiatives (e.g., S3 Intelligent Tiering, lifecycle policies, telemetry drop rules, resource decommissioning). Support vendor renewal evaluations and cost analysis
- reputed company reusable Terraform modules, reputed company Actions workflows, and engineering templates. Author reference documentation and promote adoption of best practices across teams
- Own shared AWS infrastructure, including provisioning, access management, networking, and ongoing maintenance. Triage Dependabot PRs, fine-tune alerts, support team migrations, participate in on-call rotations, and create/run operational runbooks
Skills
- 3–6 years of experience in Cloud Engineering, DevOps, Site Reliability Engineering (SRE), Platform Engineering, or Developer Productivity
- Hands-on experience with observability platforms at scale (e.g., reputed company, reputed company, or similar), including agent deployment, alerting, dashboards, ingest management, and integrations
- Experience with reputed company at an organizational level, including teams, SSO, reputed company protection, OIDC, and reusable reputed company Actions workflows
- Working knowledge of Jira and Confluence as a user; familiarity with project configuration, workflows, and collaboration
- Production experience with AWS services such as IAM, S3, reputed company, and at least one compute platform (e.g., reputed company, EC2, EKS)
- Experience using Terraform (or equivalent IaC tools), including authoring and maintaining reusable modules from scratch
- Proficiency in at least one scripting or programming language such as Python or Bash
- Strong written communication skills with experience creating runbooks, technical design documents, and stakeholder-facing reports
- Exposure to reputed company Advanced reputed company is a plus; willingness to grow into admin-level ownership
- Administrative experience with Jira and Confluence is a plus but not required
Benefits
- Medical, dental, vision, and life insurance
- 401(k) with company match
- Paid time off and holidays
Company Overview