I’m an SRE with a background in pure mathematics who enjoys debugging distributed systems, understanding how things work under the hood, and building reliable, predictable platforms. I focus on Kubernetes, automation, observability, and simplifying systems without losing technical depth.
Experience
Senior Site Reliability Engineer — Headforwards
2020 – Present
- Owned the design of a multi-region AKS platform for authentication workloads.
- Built GitOps deployment with Terraform + FluxCD across all environments.
- Implemented full observability with OpenTelemetry, Prometheus, and Grafana.
- Reduced incident load by 80% via instrumentation and alerting redesign.
- Delivered a zero-downtime migration from on-prem to Azure.
Site Reliability Engineer — Mydrive Solutions
2018 – 2020
- Migrated a legacy Ruby on Rails app to EKS, improving scalability and resilience.
- Rebuilt monitoring & alerting using Splunk; reduced false positives to near zero.
- Provided infra automation for ML workloads using Terraform and review workflows.
DevOps Engineer — Cloud66
2015 – 2018
- Designed a custom container orchestration system before Kubernetes became mainstream.
- Helped build a tool that automated Kubernetes deployments for customer systems.
- Worked directly with clients, converting requirements into dependable infra.
- Mentored junior engineers in Linux, automation, and platform fundamentals.
Skills
- Kubernetes (AKS, EKS, Operators, Networking)
- Terraform, Helm, GitOps (FluxCD)
- Observability: OTel, Prometheus, Grafana
- Distributed systems debugging
- Go, Python, Bash
- Linux administration, L3/L4/L7 networking
- Incident response & reliability engineering
Tools & Technologies
Cloud & Platforms
Azure, AWS, on-prem Kubernetes
IaC & Automation
Terraform, Ansible, Helm, FluxCD
Observability
Prometheus, Loki, Mimir, OpenTelemetry
Systems & Networking
Linux internals, routing, load balancing, DNS, HTTP/2