Here’s How You Make an Impact:
Build and Scale Observability-as-Code
-
Design and maintain Python tooling and Terraform modules that standardize Datadog configuration across services.
-
Eliminate manual setup by codifying monitors, dashboards, SLOs, and alerting patterns.
-
Improve consistency, repeatability, and reliability of observability across the organization.
Establish Reliable & Standardised Instrumentation
-
Define and implement observability blueprints that integrate high‑fidelity metrics, logs, and traces into the development lifecycle.
-
Codify best practices so teams get out-of-the-box visibility without needing deep observability expertise.
-
Raise the baseline for service health, debuggability, and operational readiness
Optimize Datadog Usage and Cost
-
Own critical parts of the Datadog platform configuration.
-
Improve data quality, signal-to-noise ratio, and alert reliability.
-
Partner with teams to adopt telemetry effectively while managing ingestion and alerting costs.
Maintain and Evolve Platform Components
-
Upgrade and maintain tracers, agents, and shared observability libraries.
-
Ensure upgrades are automated, backwards-compatible, and minimally disruptive to product teams.
-
Reduce operational risk by improving rollout and validation processes.
Integrate Observability Across Infrastructure
-
Collaborate with Platform and Infrastructure teams to embed monitoring into systems such as Kafka, gRPC services, Kubernetes, and AWS-managed services.
-
Improve production visibility and reduce mean time to detect (MTTD) and resolve (MTTR) incidents
Deliver High-Quality, Production-Ready Code
-
Write clean, well-tested, and maintainable Python code and Terraform modules.
-
Participate in architecture and design reviews; provide thoughtful feedback in code reviews.
-
Take ownership of projects end-to-end, from design and implementation through production rollout and support.
Mentorship & Collaboration
-
Assist team members to solve problems and develop their own skills.
-
Foster a collaborative mindset within the team.
You Thrive Here By Possessing the Following:
-
Degree in Computer Science, or related.
-
7+ years of experience in application development, platform engineering, or developer tooling.
-
High proficiency in Python; solid experience with Terraform.
-
Hands-on experience using Datadog for metrics, logging, tracing, dashboards, monitors, and alerts.
-
Experience with containerized and cloud-native environments (e.g., Kubernetes, Kafka, AWS, gRPC, Lambda).
-
Proven ability to independently drive medium-to-large initiatives from design to delivery.
-
Comfortable making pragmatic tradeoffs to deliver reliable, scalable solutions.
-
A strong product mindset for internal tools.
-
Passion for reducing cognitive load, eliminating toil, and making observability easy to adopt by default.
-
Solid understanding of modern web applications and distributed systems.
-
Knowledge of how observability applies to high-throughput, highly available systems
-
Clear written and verbal communication skills.
-
Ability to influence technical direction through design discussions, documentation, and hands-on implementation.
-
Comfortable partnering with product, platform, and infrastructure teams.