I’m an SRE with a background in pure mathematics who enjoys debugging distributed systems, understanding how things work under the hood, and building reliable, predictable platforms. I focus on Kubernetes, automation, observability, and simplifying systems without losing technical depth.

Experience

Senior Site Reliability Engineer — Headforwards

2020 – Present

Owned the design of a multi-region AKS platform for authentication workloads.
Built GitOps deployment with Terraform + FluxCD across all environments.
Implemented full observability with OpenTelemetry, Prometheus, and Grafana.
Reduced incident load by 80% via instrumentation and alerting redesign.
Delivered a zero-downtime migration from on-prem to Azure.

Site Reliability Engineer — Mydrive Solutions

2018 – 2020

Migrated a legacy Ruby on Rails app to EKS, improving scalability and resilience.
Rebuilt monitoring & alerting using Splunk; reduced false positives to near zero.
Provided infra automation for ML workloads using Terraform and review workflows.

DevOps Engineer — Cloud66

2015 – 2018

Designed a custom container orchestration system before Kubernetes became mainstream.
Helped build a tool that automated Kubernetes deployments for customer systems.
Worked directly with clients, converting requirements into dependable infra.
Mentored junior engineers in Linux, automation, and platform fundamentals.

Skills

Kubernetes (AKS, EKS, Operators, Networking)
Terraform, Helm, GitOps (FluxCD)
Observability: OTel, Prometheus, Grafana
Distributed systems debugging
Go, Python, Bash
Linux administration, L3/L4/L7 networking
Incident response & reliability engineering

Tools & Technologies

Cloud & Platforms

Azure, AWS, on-prem Kubernetes

IaC & Automation

Terraform, Ansible, Helm, FluxCD

Observability

Prometheus, Loki, Mimir, OpenTelemetry

Systems & Networking

Linux internals, routing, load balancing, DNS, HTTP/2