Lead Site Reliability Engineer
We’re looking for a top-notch, hands-on SRE to lead our small and talented infrastructure engineering team and help us elevate our game when it comes to designing, building and operating high-performance and highly-available systems.We’re backed by Insight Venture Partners and Iconiq Capital, we’re on a path to $1B in 2019, and we’ll get there - even more surely if you come help us.
Every engineer is responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to support them succeed.
Our production systems are hosted in AWS datacenters running a large Ruby on Rails web application and a handful of smaller services in Ruby, Node.js, and Java. We currently deploy 3-5 times a day. Our systems are stable and fire drills are rare. Technologies we’re currently using include:
- Amazon Web Services (EC2, ELB, S3, RDS, ElastiCache) and Ubuntu Linux
- Postgres, Redis, Memcached, ElasticSearch
- Chef, ServerSpec, Terraform, NewRelic, DataDog, Sumo Logic and Test Kitchen
In this mission-critical role, you would:
- Design, build, and maintain the core infrastructure of our product
- Actively manage the backlog for our infrastructure team and work closely with other SREs on the team to provide coaching and mentorship
- Help us increase developer productivity and get to true continuous delivery
- Develop operational and security standards and champion operational excellence and secure coding practices
- Partner with engineering teams closely to educate and consult
- Participate in solution design for new features, products, systems and tooling
- Debug complex problems across the whole stack
- Continually monitor application/system performance and costs, generate actionable insights and either implement or advocate for them
- Participate in on-call rotations, along with every member of the engineering team
- Ruthlessly eliminate repetitive manual tasks and recurring errors
- Ensure we are always employing best-of-breed tooling for all our infrastructure and automation needs
- Collaboratively plot course for the maturing and growth of our infrastructure
- Participate (and sometimes run point) in handling production incidents
- Work closely with engineering teams to conduct root cause analysis for production incidents, and evolve infrastructure and tooling.
This role might be that rare opportunity if you:
- Thrive in a highly collaborative, no red-tape, rapid-growth environment
- Love building tooling and infrastructure to help developers be more productive
- Love eliminating repetitive manual tasks through automation
- Have a healthy appreciation of what it means to work in production
- Have solid Unix command line and systems chops
- Have experience with substantial, distributed SaaS or eCommerce systems
- Can point to a solid track record of success leading small-to-medium infrastructure teams
- Have vision and well-informed opinions about how to build infrastructure for a high-growth, technology-driven company that’s headed towards the $1B mark