Back to all Support positions

Incident Commander, Cloud Infrastructure

United States (Remote)

About HashiCorp

HashiCorp is a fast-growing company that solves development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. We build products to give organizations a consistent way to manage their move to cloud-based IT infrastructures for running their applications. Our products enable companies large and small to mix and match AWS, Microsoft Azure, Google Cloud, and other clouds as well as on-premises environments, easing their ability to deliver new applications for their business.

About HashiCorp

HashiCorp is a fast-growing startup that solves development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. We build products to give organizations a consistent way to manage their move to cloud-based IT infrastructures for running their applications. Our products enable companies large and small to mix and match AWS, Microsoft Azure, Google Cloud, and other clouds as well as on-premises environments, easing their ability to deliver new applications for their business.

About the role...

HashiCorp is looking for an Incident Commander for our Global Support Engineering Organization. This highly visible position will be an integral part of the Support Engineering management team and would initially report to the SVP of Global Support. You are a fit if you thrive in a fast-paced environment that values crucial communication, alignment with our company's core principles, collaboration, and results. 

This is a senior role at HashiCorp requiring an individual who can take charge in escalated code-red situations and give direction to both customer personnel and to HashiCorp engineers to drive the resolution of critical incidents (catastrophic failures). We are looking for a natural leader and a confident decision-maker that has strong problem-solving skills in Cloud environments and is experienced in SRE best practices.

As a member of our global Incident Response Team, this individual will be responsible for all aspects of our emergency response to critical outages occurring in our customer environments. This includes quickly developing incident objectives, managing all incident operations, application of resources as well as responsibility for all persons involved.

In this role, you can expect to...

  • Take command of incidents by setting up or taking over a cross-functional technical investigation with internal and external stakeholders. HashiCorp conducts our critical investigations both in asynchronous communication tools and via videoconference (ex: Zoom). This role would lead the Zoom call and coordination of various Zoom rooms working with highly technical subject matter experts internally and on the customer side.
  • Lead the effort to bring impacted systems back online by coordinating investigation and resolution of technical issues, from hands-on investigations with product engineering teams to directing workarounds and failovers for complex environments.
  • Work with HashiCorp SMEs (Support & Engineering) and with the customer Platform/Dev-Ops teams to build an incident action plan and a restoration plan if needed 
  • Provide direction and time management and keep the resolution effort on track and moving forward
  • Draft and send regular communications to keep all stakeholders, both internal or external, aware of the latest status, progress made thus far, and action items
  • Own the technical incident retrospective  process by assembling the correct technical teams and working with HashiCorp Customer Success teams for permanent remediation and recommendations to the customer
  • Work closely with Engineering to improve our products monitoring and observability capabilities and their debuggability to decrease the Time-To-Detection (TTD) and Time-To-Restoration (TTR)
  • Work closely with our Customer Success team to drive changes in customer environments aiming at improving their robustness and scale, ideally following our products best practices
  • Develop and continuously update our Incident Playbooks 
  • Ensure internal readiness at all times by leading training sessions, simulations, and drills
  • Be part of the Incident Response Team on-call rotation and ensure flawless handover of critical issues to other regions 
  • Travel (<5%)

You may be a good fit for our team if you have...

  • 10 years of overall experience with 5+ years of proven experience within SRE, Operations, DevOps, Engineering, or Technical Support teams.
  • 3+ years experience as an Incident Commander or Escalation Manager 
  • Strong leadership skills, able to take command in a highly escalated situation
  • Executive-level communication skills, able to conduct high-level retrospectives and RCA discussions internally and with customers, able to communicate clearly and effectively to technical and business audiences, able to collaborate with various partners including senior leadership and multi-functional teams
  • Excellent problem solving, analytical, and troubleshooting skills especially on a multi-cloud environment (AWS, Azure, GCP) with complex deployment architectures (multiple-cluster, HA, DR)
  • Strong influencing, negotiation, and mediation skills to be able to steer the customer towards the optimal solution
  • Demonstrable knowledge of incident management frameworks (eg. ITIL) and best practices
  • Experience with major cloud platforms (AWS, Azure, GCP), distributed systems, microservice architecture, and containers
  • Experience with scripting tools (for example, Bash, Python), REST APIs, and command-line tools
  • Bachelor’s degree in Computer Science, IT, or equivalent professional experience

HashiCorp embraces diversity and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. We believe the more inclusive we are, the better our company will be.

 

#LI-Remote

HashiCorp embraces diversity and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. We believe the more inclusive we are, the better our company will be.

For more information regarding how HashiCorp collects, uses, and manages personal information, please review our Privacy Policy.

 

Benefits at HashiCorp

Note: some benefits may differ from one country to another.

Medical, dental & vision

HashiCorp offers your choice of medical plans as well as dental and vision coverage for you and any dependents, including spouses, domestic partners, and children. Coverage begins upon your first day of hire.

Life & disability insurance

HashiCorp provides life insurance coverage in the amount equal to your annual salary at no cost to you. If you would like additional coverage, you have the option to enroll in voluntary life insurance for yourself or your dependents. You will also be covered under our short term and long term disability policies in the event that you are unable to work for an extended period of time due to a health condition.

Flexible spending account (FSA)

You can set aside pretax money to go towards the purchase or payment of approved health care and dependent care expenses. These can include copays, birth control, day care for children or elder adults, acupuncture, and more.

Vacation and Other Leaves

We believe in giving our employees the opportunity to recharge and refresh, and our vacation policy reflects that. Our Paid Vacation Policy offers employees 4 weeks of vacation per year. So, whether you’d like to vacation on a beach or relax at home, it’s up to you! Additionally, we offer 10 days of paid sick leave per year, bereavement leave, miscarriage leave and extended personal leave. We value your health and well-being and empower you to take ownership of your earned and well-deserved time away.

401(k)

Our 401(k) plan provides a variety of investment options to help you fund your retirement. The plan allows you to contribute a designated amount of your pre-taxed income from each paycheck thereby lowering your taxable annual income. The plan also offers employees the opportunity to enroll in Roth, and after-tax contributions.

Family Expansion Benefit

We are dedicated to supporting the needs of our employees and their families in a way that is inclusive of all family structures. That is why we’re proud to offer a Family Expansion Benefit through Carrot designed to support a variety of family expansion methods that range from Adoption to Fertility treatments, and can be customized to the needs and preferences of each individual employee.

Maternity and Parental Leave

To bond with their newborn, we provide birthing parents up to 16 weeks of paid maternity leave via short-term disability and HashiCorp’s parental leave policy. For non-birthing parents (including adoptive) we offer 8 weeks of paid parental leave.

Expanded Mental Health Support

We understand the importance of supporting our employees mental health, and are committed to doing this through a variety of resources. In addition to offering an Employee Assistance Program (EAP), we provide employees access to an on-demand behavioral healthcare benefit through Ginger.