logo

View all jobs

Principal Site Reliability Engineer

MA, MA

Your Impact

The Site Reliability Engineering (SRE) team is foundational to the growth and scale of our platform. You and your team will help advance several initiatives tied to automation, SRE culture, and cloud architecture. You will ensure reliability and trust to over 38 million users who learn on our platform, enabling them to advance their careers and their lives.

This is a unique opportunity to technically lead the engineering organization in standardizing our automated operations infrastructure, service provisioning and orchestration. You will bring deep practical knowledge and help guide team members through forward-looking planning, execution of technical advancements, and re-platforming efforts. The results of your work will empower our product engineers to move more autonomously in the build of our features.

Your Team

You will be part of a team based in the US, with additional support from engineers offshore. The team forms a center of operational expertise for the organization, currently leading us in a transformational journey to containerized Kubernetes clusters and modern configuration and deployment pipelines. In addition to providing these platform-level services, the team functions as an enabler for product delivery teams - with training, automation, and operational support.

You Will

  • Lead definition of a technical roadmap for the engineering organization to utilize fully automated, self-service, highly scalable, observable, and reliable infrastructure 
  • Drive the execution of this roadmap, collaborating with SREs and senior engineers across the organization, while performing hands-on work on the most critical challenges
  • Provide expert technical guidance and engineering design review to teams planning and implementing broad architectural shifts and capacity growth
  • Help our open source community stand up their own Open platforms and make contributions to our code bases
  • Rapidly diagnose and resolve faults with services as a member of an on-call rotation focused on investing in actionable alerting and automation to reduce alert fatigue
  • Contribute to the company in multiple areas, constantly pushing yourself to be a better engineer and to level up peers within your team and across the organization.

You Have

  • 10+ years of professional experience demonstrating hands-on technical leadership with systems engineering skills
  • At least 2 years of experience working with Kubernetes in a production environment
  • A working knowledge of Linux both as an end-user and as an administrator
  • Leadership skills that promote diversity of opinions and inclusive discussions when solving challenging problems
  • Demonstrated clear decision making with technical design trade-offs in complex situations
  • Strong business communication skills for interfacing with technical and non-technical business stakeholders

Preferred

  • Nginx, MySQL, MongoDB, Django, Splunk, Git, and Jenkins knowledge
  • A robust understanding of automation tools, continuous integration pipelines, using CI/CD systems and configuration management

Why You’ll Like it Here

  • We are collaborative at its core. You’ll work within your team and across the organization, allowing for continuous learning and discovery.
  • We’re on a mission to unlock our learners’ potential on a global level, seeking to create a more diverse, equitable, and inclusive world. 
  • We set outcomes that matter and provide value in all that we do, from building meaningful products to serving the our community. 

We understand that applying for a job can be intimidating. Applicants rarely meet every single job requirement, and we know there are many skills and backgrounds that will contribute to success in this role. 

That’s why we provide new employees with:

  • Employee on-boarding and training sessions
  • Personalized 30/60/90+ day plans
  • Individual quarterly and annual goals 
  • Career pathways 

And much more! If this role looks like a great next step for you, please apply… even if you can’t “check every box.” We’d love to hear from you! 

We are the education movement for restless learners. Together with our founding partners Harvard and MIT, we’ve brought together more than 38 million learners, the majority of top-ranked universities in the world, and industry-leading companies onto one online learning platform that supports learners at every stage. And we’re not stopping there—as a global nonprofit, we’re relentlessly pursuing our vision of a world where every learner can access education to unlock their potential, without the barriers of cost or location.

Share This Job

Powered by