Lead Site Reliability Engineer

col-narrow-left
Job ID:
2414315
Location:
Boston, MA
Category:
Engineering
Salary:
$160,000.00 per year
Employment Type:
Full time
Posted:
06.17.2017
col-narrow-right
col-wide

Job Description:

Salary: $130,000 - $160,000

Our client is a well funded private growing company. They are a business/consumer services based software play with an awesome culture. They are looking for an experienced, hands-on Site Reliability Engineer to lead a small infrastructure engineering team.  They are looking to up their game by designing, building and operating high-performance and highly-available systems.
 
Site Reliability Engineers play a critical part in providing the tools, practices, and expertise to support the engineers who are building software. Their production systems are hosted in AWS datacenters running a large Ruby on Rails web application and a handful of smaller services in Ruby, Node.js, and Java. They deploy 3-5 times a day. Please have vision and well-informed opinions about how to build infrastructure for a high-growth, technology-driven company. 

Technologies being used are:
  • Amazon Web Services (EC2, ELB, S3, RDS, ElastiCache) and Ubuntu Linux
  • Postgres, Redis, Memcached, ElasticSearch
  • Chef, ServerSpec, Terraform, NewRelic, DataDog, Sumo Logic and Test Kitchen
Responsibilities:
  • Design, build, and maintain the core infrastructure
  • Actively manage the backlog and work closely with others on the team to provide coaching and mentorship
  • Help increase developer productivity and get to true continuous delivery
  • Develop operational and security standards and champion operational excellence and secure coding practices
  • Partner with engineering teams closely to educate and consult
  • Participate in solution design for new features, products, systems and tooling
  • Debug complex problems across the whole stack
  • Continually monitor application/system performance and costs, generate actionable insights and either implement or advocate for them
  • Participate in on-call rotations, along with every member of the engineering team
  • Eliminate repetitive manual tasks and recurring errors
  • Ensure they are always employing best-of-breed tooling for all infrastructure and automation needs
  • Collaboratively plot course for the maturing and growth of their infrastructure
  • Participate (and sometimes run point) in handling production incidents
  • Work closely with engineering teams to conduct root cause analysis for production incidents, and evolve infrastructure and tooling.
Requirements:
  • Work well in a highly collaborative, no red-tape, rapid-growth environment
  • Enjoy building tooling and infrastructure to help developers be more productive
  • Love eliminating repetitive manual tasks through automation
  • Have solid Unix command line and systems chops
  • Have experience with substantial, distributed SaaS or eCommerce systems
Company Info
The Job Jobber