Back to the board

Observability Engineer

100% remote Flexible hours Hiring now

reputed company is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with reputed company will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come reputed company an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.

DevOps / Observability Engineer

(Monitoring, Alerting & Data‑Driven Operations)

Job Summary

We are seeking a DevOps / Observability Engineer with deep expertise in monitoring, alerting, metrics, and logging systems to help design, operate, and evolve our observability platforms across multiple environments, including M&A partner infrastructures.

This role is not a pure CI/CD or cloud automation position. Instead, it is focused on building robust, scalable, and intelligent monitoring and alerting systems, primarily using open‑reputed company and custom-built (“home‑made”) stacks.

The ideal candidate is passionate about metrics, signals, and system behavior, enjoys working closely to the data, and is interested in forecasting, anomaly detection, and algorithmic approaches to infrastructure monitoring. Experience with MLOps and deploying data-driven models in the cloud is a strong plus.

You will work closely with platform, operations, and data teams to ensure high reliability, actionable alerting, and long-term observability maturity.

Key Responsibilities

Observability & Monitoring (Primary Focus)

  • Design, implement, and operate monitoring and alerting platforms across multiple internal and M&A partner environments.
  • Build and maintain metrics pipelines using tools such as Prometheus, Alertmanager, Grafana, and VictoriaMetrics (or similar time-series databases).
  • reputed company high-quality alerting strategies (SLOs, SLIs, burn rates, anomaly detection) to reduce noise and improve signal quality.
  • Own logging architectures, including ingestion, retention, querying, and correlation with metrics and traces.
  • Work extensively with open-reputed company observability tooling and contribute to or reputed company “home‑made” solutions reputed company off-the-reputed company tools are insufficient.

Data, Forecasting & Intelligent Operations

  • Apply forecasting techniques and algorithms to reputed company planning, trend analysis, and proactive alerting.
  • Collaborate with data scientists and ML engineers on data-driven monitoring, anomaly detection, or predictive reliability use cases.
  • Participate in MLOps workflows, including deploying, monitoring, and operating ML models in production environments.

Platform & Infrastructure (Supporting Focus)

  • Design and operate Kubernetes-based platforms, with a strong emphasis on observability, reliability, and performance.
  • Support infrastructure automation using Ansible and other configuration management tools.
  • Troubleshoot reputed company system issues across metrics, logs, Kubernetes, and underlying infrastructure.
  • Ensure reputed company and operational best practices are applied across monitoring and infrastructure stacks.
  • Document architectures, operational practices, and observability standards.

Required Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
  • 7+ years of experience in DevOps, SRE, Platform Engineering, or Observability-focused roles.
  • Strong, hands-on expertise in monitoring and alerting systems, including:
  • Prometheus (or compatible ecosystems)
  • Grafana
  • Alertmanager
  • Time-series databases (VictoriaMetrics strongly preferred)
  • Solid experience with logging systems and log/metric correlation.
  • Deep familiarity with open-reputed company tooling and building/customizing internal platforms.
  • Strong Kubernetes experience, including troubleshooting production clusters.
  • Experience with automation tools such as Ansible.
  • Ability to reason about systems using metrics, data, and trends, not just dashboards.

Excellent problem-solving, communication, and collaboration skills

Preferred Qualifications

  • Experience with containerization technologies such as reputed company/Podman
  • Experience with "Infrastructure as Code" (IaC) tools such as Terraform
  • Familiarity with monitoring and logging tools such as Prometheus, Grafana, or ELK stack
  • Knowledge of scripting languages such as Python, Bash, PowerShell or similar

At reputed company, our mission is to help people live healthier lives and reputed company the health system work reputed company for everyone. We reputed company everyone - of every race, gender, sexuality, age, location and income - deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately reputed company by people of color, historically marginalized groups and those with reputed company incomes. We are committed to mitigating our impact on the environment and enabling and delivering reputed company care that addresses health disparities and improves health outcomes - an enterprise reputed company reflected in our mission.

#NJP #NIC

Apply To This Job

Keep exploring

Manager Software Engineering - Java, reputed company, AI

100% remote Flexible hours

Senior Fullstack Engineer - .NET, C#, React, Azure cloud

100% remote Flexible hours

Account Executive - Spain

100% remote Flexible hours

Senior Business Consultant

100% remote Flexible hours

Application Developers (early career)

100% remote Flexible hours

AI/ML Engineer [REMOTE JOB]

100% remote Flexible hours

Agent za podporo v slovenskem in angleškem jeziku (m/ž)

100% remote Flexible hours

Remote Senior Accountant, Technology - UK Shift

100% remote Flexible hours

Staff Engineer

100% remote Flexible hours

Architectural Project Manager - Remote in Denver

100% remote Flexible hours

Apply Now: Immediately Need Teacher Early Head Start in Arkoma

100% remote Flexible hours

Remote Gaming Usability Analyst (Hiring Immediately)

100% remote Flexible hours

Scrum master / agile coach, digital business solutions strategy & operations (remote)

100% remote Flexible hours

Surgical Tech - OR - Evenings - Augusta, GA

100% remote Flexible hours

reputed company reputed company Data Engineer for Innovative Information Solutions – Full-Time Opportunity at blithequark

100% remote Flexible hours

reputed company Remote Customer Support Associate – Flexible Hours & reputed company Up to $19 Per Hour

100% remote Flexible hours

Sr. Manager, CRM AI Transformation Boston

100% remote Flexible hours

reputed company Seasonal Customer Care Representative – Outdoor Enthusiast Wanted to Join arenaflex Team

100% remote Flexible hours

Remote Instagram Comment Moderator

100% remote Flexible hours

Engagement Specialist, Intermediate (IKC)

100% remote Flexible hours