Site Reliability Engineer Job at Saransh Inc, Deerfield, IL

UDZ4bVBBemdCRGIxVjRpeDVRS3hYTHVsbUE9PQ==
  • Saransh Inc
  • Deerfield, IL

Job Description

Job Description

Must Have Technical/Functional Skills

  • 7+ years of experience in SRE, platform engineering, or cloud infrastructure engineering in large-scale enterprise environments (10,000+ employees or equivalent complexity).
  • Deep, hands-on expertise with Microsoft Azure — minimum 4 years in a primary Azure cloud engineering role.
  • Expert-level proficiency with AKS: cluster lifecycle management, RBAC, network policies, pod security standards, cluster autoscaler, and Workload Identity.
  • Strong infrastructure-as-code skills: Terraform (required) and/or Bicep; experience managing Azure Landing Zones or Enterprise-Scale architecture.
  • Proficiency in at least one systems programming/scripting language: Python (preferred), Go, or PowerShell.
  • Experience designing and operating enterprise observability platforms using Azure Monitor, Log Analytics and Application Insights at scale.
  • Demonstrable track record of owning SLOs/SLIs and delivering measurable reliability improvements in production.
  • Strong knowledge of enterprise networking in Azure: Hub-and-Spoke/Virtual WAN, ExpressRoute, Azure Firewall, NSGs, Private Endpoints, and DNS Private Zones.

Required/Preferred Certifications:

  • AZ-104 | AZ-305 (Preferred) | AZ-400 (Preferred) | CKA | ITIL v4 Foundation

Roles & Responsibilities

Reliability & Availability Engineering

  • Define, own, and enforce enterprise-wide SLOs, SLIs, and Error Budgets across all Tier-0 and Tier-1 Azure-hosted services; report SLA compliance to executive stakeholders monthly.
  • Lead architectural reviews for new services and ensure relia bility non-functionals (availability targets, RTO/RPO) are embedded from design through to production.
  • Champion and implement chaos engineering practices using Azure Chaos Studio and custom fault injection frameworks to proactively surface reliability risks.
  • Drive Disaster Recovery (DR) design and conduct quarterly DR drills across Azure paired regions. Incident Management & On-Call
  • Serve as Incident Commander for P1/P2 major incidents, own end-to-end incident lifecycle from detection through resolution and Post-Incident Review (PIR).
  • Participate in a structured On-Call rotation with follow-the-sun global coverage; maintain response SLAs of <5 minutes for Tier-0 services.
  • Drive blameless post-mortem culture and ensure all action items from PIRs are tracked and delivered within agreed SLA.

Observability & Platform Engineering

  • Design and operate the enterprise observability stack: Azure Monitor, Log Analytics Workspaces, Application Insights, and Azure Managed Grafana; ensure full MELT (Metrics, Events, Logs, Traces) coverage.
  • Build and maintain alerting frameworks using Azure Monitor Alert Rules and Azure Action Groups integrated with PagerDuty and ServiceNow.
  • Develop and operate platform automation, runbooks, and self-healing capabilities using Azure Automation, Logic Apps, and Python/PowerShell scripting.

CI/CD & Infrastructure Reliability

  • Collaborate with DevOps and development teams to embed reliability gates into Azure DevOps pipelines ; automated performance testing, synthetic monitoring, and progressive deployment (canary/blue-green) strategies.
  • Manage reliability of AKS clusters across multiple Azure regions, own node pool scaling, upgrade strategy and cluster hardening in alignment with CIS Benchmarks.
  • Contribute to infrastructure-as-code reliability reviews using Terraform/Bicep to enforce standards across Azure Landing Zones.

Job Tags

Similar Jobs

Energy Transfer Family of Partnerships

Pipeline Controller - Gas/Liquids Job at Energy Transfer Family of Partnerships

SummaryPosition provides for the safe, reliable product flow operation of the ET pipeline systems. Controls and optimizes the use of compressors/pumps and pipeline equipment with field personnel to facilitate gas deliveries and liquids commodities (Refined Products, NGL... 

SQL Pager LLC

Sales Development Representative (SDR) Job at SQL Pager LLC

Job Title Responsibilities: Outbound Activities actively engage with potential customers by reaching out to them via email, phone, social networks communicate with prospects and deliver value sales intelligence and personalization identifying leads ...

The Goldman Sachs Group

GBM Private, IB Classic, Healthcare, Associate - New York Job at The Goldman Sachs Group

 ..., Medical Devices, and Pharmaceuticals. With professionals in offices around the world, the group provides the full range of Goldman Sachs' investment banking services, including mergers and acquisitions, divestitures, equity and debt financings, financial restructurings... 

Mayville Engineering Company, Inc.

Sales Controller Job at Mayville Engineering Company, Inc.

 ...Sales Controller Job ID 2026-9785# of Openings 1 Type Full-Time State WI City Milwaukee...  ...Sales team to obtain information to improve forecast accuracy and pipeline visibility, identify risks and opportunities related to the plan... 

Resource One Credit Union

FRAML Investigator Job at Resource One Credit Union

 ...FRAML Investigator The FRAML Investigator assists the BSA Officer and the FRAML Manager in daily monitoring activities, specifically...  ...prevention programs. The successful candidate will have a strong background in member service, knowledge of BSA and AML regulations,...