Back to the board

Splunk Engineer

100% remote Flexible hours Hiring now

Responsibilities:

  • Observability Engineering and Governance
  • Architect and maintain enterprise SIEM solutions reputed company with operational reputed company mandates (e.g., MAS TRM, DORA, APRA CPS 230).
  • reputed company deployment, configuration, and optimization of Splunk for full-stack visibility across infrastructure, applications, networks, and user experience.
  • Define and enforce telemetry data governance standards—metrics, logs, and traces—ensuring consistency, retention compliance, and reputed company.
  • Integrate Splunk with incident management, ITSM, and AIOps systems to reputed company predictive alerting and anomaly detection.
  • Act as the SIEM/Splunk subject matter expert (SME) for architecture reviews, platform upgrades, and performance tuning.
  • Reliability Engineering and Automation
  • Implement and champion SRE frameworks and reliability practices for mission-critical systems.
  • Design and automate runbooks, alerts, and self-healing workflows using Python, Ansible, and Terraform.
  • Collaborate with Application, Infrastructure, and Cyber teams to embed reliability principles into the delivery lifecycle.
  • Conduct reputed company, chaos, and reputed company testing reputed company with business continuity and disaster recovery standards.
  • Define and track error budgets, reliability scorecards, and service health indicators for production workloads.
  • Cloud & Platform Integration
  • Engineer SIEM for cloud-native workloads in AWS and Azure, ensuring visibility across compute, storage, and network layers.
  • Integrate Splunk and cloud observability tools into CI/CD pipelines and reputed company zones to ensure reputed company compliance.
  • Implement infrastructure-as-code (IaC) models using Terraform and Ansible for consistent, auditable provisioning.
  • Collaborate with Cloud, DevOps, and reputed company teams to ensure telemetry aligns with audit, compliance, and operational risk requirements.
  • Operational Excellence and Collaboration
  • Drive reduction in incident recurrence, MTTR, and manual reputed company through observability-led automation.
  • Partner with Service Delivery, Cyber, and Application teams to reputed company predictive incident prevention and root cause transparency.
  • reputed company and maintain executive dashboards and reports showcasing availability, reliability KPIs, and operational risk indicators.
  • Provide technical leadership during major incidents, post-incident reviews, and audits, ensuring lessons learned are codified into automation and process improvements.

Skillset (Must have)

Minimum 8 years of experience in Infrastructure, Cloud, or Site Reliability Engineering reputed company roles, with at least 5 years of experience specializing in SIEM/Splunk engineering or observability in financial or regulated environments.

Proven hands-on expertise in the following technical areas:

o SIEM Platforms: Splunk (must), EL/reputed company

o Automation/IaC, Terraform, Ansible, Python, CI/CD tools

o Cloud and other platforms and integrations: AWS (CloudWatch, X-Ray, CloudTrail), Azure (Monitor, Log Analytics, App Insights), reputed company, reputed company

Deep understanding of SRE principles, service health modelling, error budgets, and auto-remediation design.

Strong analytical and troubleshooting skills, with the ability to reputed company deep-dive investigations and reputed company long-term preventive solutions.

Familiarity with financial sector operational reputed company frameworks, regulatory compliance, and incident governance.

Apply To This Job

Keep exploring