Site Reliability Engineering Manager

CAE Inc.


Fecha: hace 4 horas
ciudad: Montevideo, Montevideo
Tipo de contrato: Tiempo completo

Behind every success is a team of dedicated experts driving us forward. Our corporate functions don’t just support — they lead, shaping the company’s path and preparing clients and employees for the moments that matter. Be part of a team where your work makes a difference, with opportunities to grow, collaborate, and thrive.


Site Reliability Engineering Manager (SRE)

Roles & Responsibilities

We are seeking a Reliability Engineering Manager with a strong mix of Site Reliability Engineering (SRE) and Database experience to lead the reliability, scalability, performance, observability, and operational excellence of critical production platforms. This role combines deep database expertise with broader service reliability ownership, ensuring databases and platform services are engineered as highly available, measurable, automated, and resilient systems aligned with business and customer needs.


The Role We Are Offering You


Leadership & Team Management:
  • Lead and develop a team of reliability engineers across database and platform domains
  • Define the reliability strategy, operating model, and roadmap for critical production services
  • Build a culture of automation, ownership, resilience, and continuous improvement

Service & Database Reliability:
  • Own the availability, performance, scalability, and resilience of production databases and dependent platform services
  • Define and enforce SLIs, SLOs, and error budgets for critical services and database platforms
  • Lead incident management, postmortems, and remediation efforts to reduce recurrence and improve recovery

Observability & Monitoring:
  • Define observability strategy across databases, services, infrastructure, and dependencies
  • Ensure effective metrics, logs, traces, dashboards, and alerting for proactive detection and response

Automation & Platform Engineering:
  • Drive Infrastructure as Code, operational automation, and self-healing mechanisms
  • Reduce manual toil through standardized workflows for provisioning, scaling, backup, recovery, and routine operations

Performance, Capacity & Cost Efficiency:
  • Improve system and database performance through tuning, capacity planning, and architecture reviews
  • Partner on cost optimization initiatives across cloud infrastructure, database services, storage, and licensing

Disaster Recovery & Operational Readiness:
  • Own RTO/RPO alignment, resilience planning, backup strategy, and disaster recovery exercises
  • Improve production readiness for releases, migrations, and major operational events

Cross-Functional Collaboration:
  • Partner with Development, Platform Engineering, Infrastructure, Security, and Product teams to improve service reliability end to end
  • Act as a technical leader and escalation point for reliability concerns across the stack


Minimum Qualifications


To succeed in this role, you bring:

  • 7+ years of experience in Site Reliability Engineering, Database Engineering or related production engineering roles
  • 2+ years of leadership experience managing engineers or technical teams
  • Strong expertise in relational database platforms such as PostgreSQL, Oracle or SQL Server
  • Experience operating reliable services in cloud or hybrid environments, including AWS-managed or self-managed platforms
  • Deep analytical, operational, and performance-focused mindset
  • Ability to balance service reliability, engineering velocity, and cost efficiency
  • Strong background in automation, observability, incident response, performance tuning, and high availability/disaster recovery practices
  • Knowledge of Kubernetes, containerized workloads, and platform engineering practices
  • Experience defining SLOs, SLIs, alerting strategies, and operational readiness standards
  • Strong communication and cross-functional collaboration skills across engineering and business stakeholders


Preferred Qualifications

  • Experience with distributed systems, large-scale production environments, or customer-facing platforms
  • Experience with automation frameworks, Infrastructure as Code, and CI/CD pipelines
  • Exposure to FinOps, cloud cost optimization, or capacity modeling initiatives
  • Familiarity with modern observability platforms and reliability review processes

About CAE

At CAE, our mission is clear: to help make the world a safer place. For nearly 80 years, we’ve driven innovation in simulation, training, and mission readiness to support critical operations worldwide. By leveraging advanced technologies, we empower our customers to operate smarter, faster, and more sustainably. Join a purpose-driven organization where bold ideas are encouraged, collaboration drives progress, and your growth fuels our shared success.


Position Type

Regular

Equal Opportunity & Accommodations

CAE is committed to providing equal opportunities to all applicants, regardless of race, nationality, color, religion, sex, gender identity or expression, sexual orientation, disability, neurodiversity, veteran status, age, or other characteristics protected by law. We encourage applicants who may not meet every qualification to apply. Reasonable accommodations are available—contact your recruiter or email [email protected] if needed.

Data Privacy

Privacy Statement | CAE

As part of our process, we may use AI‑supported tools to help review applications, with human decision‑making at every step. CAE thanks all applicants for their interest. However, only those whose background and experience match the requirements of the role will be contacted.

Cómo postularme

Para solicitar este empleo, debe autorizarse en nuestro sitio web. Si aún no tiene una cuenta, regístrese.

Publicar un currículum