About me


Site Reliability Engineer

I’m an SRE with a background in pure mathematics who enjoys debugging distributed systems, understanding how things work under the hood, and building reliable, predictable platforms. I focus on Kubernetes, automation, observability, and simplifying systems without losing technical depth.


Experience

Senior Site Reliability Engineer — Headforwards

2020 – Present

  • Owned the design of a multi-region AKS platform for authentication workloads.
  • Built GitOps deployment with Terraform + FluxCD across all environments.
  • Implemented full observability with OpenTelemetry, Prometheus, and Grafana.
  • Reduced incident load by 80% via instrumentation and alerting redesign.
  • Delivered a zero-downtime migration from on-prem to Azure.

Site Reliability Engineer — Mydrive Solutions

2018 – 2020

  • Migrated a legacy Ruby on Rails app to EKS, improving scalability and resilience.
  • Rebuilt monitoring & alerting using Splunk; reduced false positives to near zero.
  • Provided infra automation for ML workloads using Terraform and review workflows.

DevOps Engineer — Cloud66

2015 – 2018

  • Designed a custom container orchestration system before Kubernetes became mainstream.
  • Helped build a tool that automated Kubernetes deployments for customer systems.
  • Worked directly with clients, converting requirements into dependable infra.
  • Mentored junior engineers in Linux, automation, and platform fundamentals.

Skills

  • Kubernetes (AKS, EKS, Operators, Networking)
  • Terraform, Helm, GitOps (FluxCD)
  • Observability: OTel, Prometheus, Grafana
  • Distributed systems debugging
  • Go, Python, Bash
  • Linux administration, L3/L4/L7 networking
  • Incident response & reliability engineering

Tools & Technologies

Cloud & Platforms

Azure, AWS, on-prem Kubernetes

IaC & Automation

Terraform, Ansible, Helm, FluxCD

Observability

Prometheus, Loki, Mimir, OpenTelemetry

Systems & Networking

Linux internals, routing, load balancing, DNS, HTTP/2