About the role...
HashiCorp is looking for an Incident Commander for our Global Support Engineering Organization. This highly visible position will be an integral part of the Support Engineering management team and would initially report to the SVP of Global Support. You are a fit if you thrive in a fast-paced environment that values crucial communication, alignment with our company's core principles, collaboration, and results.
This is a senior role at HashiCorp requiring an individual who can take charge in escalated code-red situations and give direction to both customer personnel and to HashiCorp engineers to drive the resolution of critical incidents (catastrophic failures). We are looking for a natural leader and a confident decision-maker that has strong problem-solving skills in Cloud environments and is experienced in SRE best practices.
As a member of our global Incident Response Team, this individual will be responsible for all aspects of our emergency response to critical outages occurring in our customer environments. This includes quickly developing incident objectives, managing all incident operations, application of resources as well as responsibility for all persons involved.
**Currently we are looking for candidates in Austin, TX and San Francisco, CA and surrounding areas (remote role)**
In this role, you can expect to...
- Take command of incidents by setting up or taking over a cross-functional technical investigation with internal and external stakeholders. HashiCorp conducts our critical investigations both in asynchronous communication tools and via videoconference (ex: Zoom). This role would lead the Zoom call and coordination of various Zoom rooms working with highly technical subject matter experts internally and on the customer side
- Lead the effort to bring impacted systems back online by coordinating investigation and resolution of technical issues, from hands-on investigations with product engineering teams to directing workarounds and failovers for complex environments
- Work with HashiCorp SMEs (Support & Engineering) and with the customer Platform/Dev-Ops teams to build an incident action plan and a restoration plan if needed
- Provide direction and time management and keep the resolution effort on track and moving forward
- Draft and send regular communications to keep all stakeholders, both internal or external, aware of the latest status, progress made thus far, and action items
- Own the technical incident retrospective process by assembling the correct technical teams and working with HashiCorp Customer Success teams for permanent remediation and recommendations to the customer
- Work closely with Engineering to improve our products monitoring and observability capabilities and their debuggability to decrease Time-To-Detection (TTD) and Time-To-Restoration (TTR)
- Work closely with our Customer Success team to drive changes in customer environments aiming at improving their robustness and scale, ideally following our products best practices
- Develop and continuously update our Incident Playbooks
- Ensure internal readiness at all times by leading training sessions, simulations, and drills
- Be part of the Incident Response Team on-call rotation and ensure flawless handover of critical issues to other regions
You may be a good fit for our team if you have...
- 10 years of overall experience with 5+ years of proven experience within SRE, Operations, DevOps, Engineering, or Technical Support teams.
- 3+ years experience as an Incident Commander or Escalation Manager
- Strong leadership skills, able to take command in a highly escalated situation
- Executive-level communication skills, able to conduct high-level retrospectives and RCA discussions internally and with customers, able to communicate clearly and effectively to technical and business audiences, able to collaborate with various partners including senior leadership and multi-functional teams
- Excellent problem solving, analytical, and troubleshooting skills especially on a multi-cloud environment (AWS, Azure, GCP) with complex deployment architectures (multiple-cluster, HA, DR)
- Strong influencing, negotiation, and mediation skills to be able to steer the customer towards the optimal solution
- Demonstrable knowledge of incident management frameworks (eg. ITIL) and best practices
- Experience with major cloud platforms (AWS, Azure, GCP), distributed systems, microservice architecture, and containers
- Experience with scripting tools (for example, Bash, Python), REST APIs, and command-line tools
- Bachelor’s degree in Computer Science, IT, or equivalent professional experience #LI-MS1
Life at HashiCorp
HashiCorp is driven by our people and our principles which have been the foundation of everything we do since the company was founded in 2012. Join us on our journey as we work to support the world's most innovative companies as they transition to cloud and multi-cloud infrastructure through simple yet powerful workflows and automation.
At HashiCorp, we build the infrastructure that enables innovation. Our suite of multi-cloud infrastructure automation products are the underpinnings of the largest enterprises in the world, who rely on our solutions to provision, secure, connect, and run their critical applications to deliver crucial services, communications tools, and entertainment platforms to the world. We're building a once-in-a-generation infrastructure company with a unique approach rather than focusing on specific technologies, and we build products and solutions that support real-world workflows spanning the multiple cloud environments that nearly every organization worldwide is using today.
HashiCorp is proud to be an Equal Employment Opportunity employer. We are committed to providing equal employment opportunities to qualified applicants and do not discriminate on the basis of race, color, ancestry, religion, sex, pregnancy, gender, gender identity, gender expression, sexual orientation, national origin, age, marital status, genetic information, disability, protected veteran status or any other characteristic protected by federal, state, or local laws. We also consider qualified applicants with arrest and conviction records consistent with the San Francisco Fair Chance Ordinance, the Los Angeles Fair Chance Ordinance, and other applicable state or local laws.
HashiCorp is committed to providing reasonable accommodations to qualified individuals with disabilities in our job application procedures. If you need assistance or an accommodation due to a disability, please reach out to email@example.com
We comply with all laws and regulations set forth in the following posters: