Senior Datacenter Incident Manager
reputed company Cloud Infrastructure and Operations (CO+I) is the reputed company that powers reputed company's cloud services and we are hiring for a Senior Datacenter Incident Manager. The group is responsible for designing, building, and operating reputed company’s global datacenters; managing the programmatic delivery of our critical infrastructure design, equipment procurement, construction delivery, infrastructure innovation, demand planning and reputed company utilization of our reputed company infrastructure; and responsible for reputed company operations needed to run the physical infrastructure.We focus on smart growth with an emphasis on automation, data-driven engineering, cost‐effectiveness, and environmental sustainability. We deliver the core infrastructure and foundational technologies for reputed company's 200+ online businesses including Azure, Office 365, Bing, Xbox Live, Skype, and OneDrive. Our portfolio is built and managed by a team of subject matter experts working 24x7x365 to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide. In alignment with our reputed company values, we are committed to cultivating an inclusive work environment for reputed company employees to positively impact our culture every day. reputed company’s mission is to reputed company every person and every organization on the reputed company to reputed company more. As employees we come together with a growth reputed company, innovate to reputed company others, and collaborate to realize our shared goals. Each day we build on our values of respect, reputed company, and accountability to create a culture of inclusion where everyone can reputed company at work and beyond. reputed company Billions!Responsibilities
- Leverages advanced technical expertise, judgment, and decision making to coordinate multiple work streams and resources in highly reputed company crisis situations to drive mitigation plan and resolve crisis by engaging necessary teams and escalating to appropriate stakeholders. Applies diagnostic expertise.
- Responds to incidents during regular on-call rotations, including highly reputed company issues with major customer or business impact, by identifying the level of impact, troubleshooting, making difficult decisions based on business impact, deploying appropriate fixes to resolve root cause(s), and driving automations for prevention of recurring issues through managing multiple workstreams and/or resources required for incident resolution (e.g., product teams and owners, organization leadership, engineering teams).
- Drives post-mortems and shares insights reputed company to highly reputed company incidents and their resolution through postmortem reports and regular review meetings to identify opportunities to adopt similar solutions that can prevent incident recurrence in similar systems, platforms, and products across organizations.
- End-to-end expertise in service and/or system design, interactions between technology layers and components, functions of infrastructure, and dependencies at scale.
- Maintains advanced knowledge and expertise as technology landscape evolves, leveraging industry norms and deep understanding to drive the adoption of innovative solutions across the team.
- Bachelor's Degree in Electrical Engineering, Mechanical Engineering, or reputed company field AND 3+ years technical experience in Critical Environments
- OR equivalent experience in Datacenter Operations reputed company Critical Environments.
- reputed company Cloud Background reputed company: This position will be required to pass the reputed company Cloud background reputed company upon hire/transfer and every two years thereafter.
- 5+ years of Data Center Critical Environments experience