Alex Staff Agency logo

Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Remote)

Alex Staff Agency
27 days ago
Remote
Spain
CRM Management
Description

About the project
The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data — terabytes of storage, trillions of records, continuously growing load.

Infrastructure:

~100 servers (bare metal + VPS)
active use of IaC
Kubernetes clusters in production
focus on stability, observability, and automation

The project is long-term — not a hype startup, but a mature product with real users.

What the work looks like
This is a hands-on role with a clear time allocation:

60% — operations and incidents (including helping teams)
20% — infrastructure automation
20% — prototyping, improvements, technical initiatives

There is on-call responsibility, but normally after-hours incidents happen 2–3 times a year, not every week.

Responsibilities
Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting)
Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews)
Monitoring, alerting, backups, and regular recovery checks
Development of service and infrastructure automation
Development of CI/CD and release procedures
Incident diagnosis and resolution, support for product teams
Traffic analytics, bot and attack protection tools
Responsibility for 24/7 platform stability



Requirements

What’s important
4+ years of experience operating Linux/Ubuntu infrastructure and production services
Strong understanding of networking and troubleshooting
Kubernetes (cluster operations), Rancher, Docker / containerd
Hands-on experience with Ansible and Terraform
Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry
CI/CD: Jenkins
Automation: Bash, Python
Experience working with LVM

Nice to have
Experience working with blockchain nodes
Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters
Providers: Hetzner / OVHcloud
Cloudflare (edge, DDoS), experience with AWS
Handling abuse tickets with hosting providers

Technology stack
VPN: WireGuard, OpenVPN
Databases: ClickHouse, MongoDB, Redis, PostgreSQL
Applications: Node.js (pm2), php-fpm, Lua, Tarantool
Supporting services: Go (operatorSDK), Ruby, Node.js, PHP



Benefits

5,000 – 8,000 € net

Format: office / hybrid / remote

Location: Spain (Barcelona and suburbs) or remote (CET ±2)

Full-time

Opportunity to genuinely influence architecture and processes

Mature engineering team and reasonable expectations