Job Overview:
Arm IT is building a next-generation Service Management capability that combines ITIL v4 framework, SRE methodology, and AI-enabled automation to enable engineering velocity with enterprise reliability.
As the Senior Service Operations & Assurance Specialist, this role acts as the operational integrity authority across Enterprise IT services. The position governs the end-to-end effectiveness of ITIL Service Operations processes through a structured Service Assurance framework.
A data-driven, risk-aware role focused on detection, response, analysis, and risk reduction operate as a unified reliability model — not as disconnected processes.
Responsibilities:
- Own and evolve Incident, Major Incident, Problem, Event, and Availability Management processes.
- Govern operational integrity, performance standards, and reliability compliance across ITSM practices.
- Present service reliability risk posture and trigger corrective action when thresholds or error budgets are breached.
- Ensure operational processes' function as an integrated, end-to-end reliability system.
- Govern performance against the 15-minute P1 response SLA and monitor MTTR, response quality, and critical issues' effectiveness.
- Drive structured improvements through incident trend analysis and repeat incident reduction, being able to identify patterns in incident response performance.
- Ensure actionable root cause investigations (PIRs) and govern Known Error lifecycle through to permanent resolution.
- Identify and address systemic architectural, process, and change-related risks.
- Optimise monitoring and alerting to improve signal-to-noise ratio and Mean Time to Detect (MTTD).
- Embed AI-assisted triage, correlation, and automation into detection and response workflows.
- Monitor SLA and SLO performance, availability trends, reliability, and error budget consumption aligned and contributing to IT overall service health goals.
- Align reliability insights with engineering backlogs and platform roadmaps.
- Ensure resilience controls (failover, redundancy, disaster recovery) are visible and governed.
- Use data and trend analysis to predict risk, prevent instability, and shift operations from reactive recovery to predictive prevention.
Required Skills and Experience:
- ITIL V4 certification with foundation as a minimum, with strong knowledge of ITIL v4 across Incident, Major Incident, Problem, Event, Monitoring, Availability, and Continual Improvement.
- Experience governing operational performance in high-availability, engineering environments.
- Experience implementing and governing a Service Assurance framework.
- Familiarity with SRE principles.
- Experience with observability platforms, monitoring, and alerting tooling.
- Strong data analysis capability, including trend interpretation and risk modelling.
- Experience reducing repeat incidents and systemic operational risk.
- Understanding of CI/CD and change governance integration.
- Structured decision-making capability in high-pressure environments.
- Good communication and influencing skills across technical and leadership audiences.
“Nice To Have” Skills and Experience:
- Experience implementing AIOps or event correlation tooling!
- Experience designing predictive reliability and performance dashboards.
- Exposure to SRE operating models in mature engineering organisations.
- Experience within semiconductor, SaaS, or complex global technology environments.
- Experience in Service Now!
In Return:
With Arm’s growth trajectory, you’ll have clear opportunities to develop your career, take on new challenges, and make a real impact on our continued success
#LI-JW1
Accommodations at Arm
At Arm, we want to build extraordinary teams. If you need an adjustment or an accommodation during the recruitment process, please email accommodations@arm.com. To note, by sending us the requested information, you consent to its use by Arm to arrange for appropriate accommodations. All accommodation or adjustment requests will be treated with confidentiality, and information concerning these requests will only be disclosed as necessary to provide the accommodation. Although this is not an exhaustive list, examples of support include breaks between interviews, having documents read aloud, or office accessibility. Please email us about anything we can do to accommodate you during the recruitment process.
Hybrid Working at Arm
Arm’s approach to hybrid working is designed to create a working environment that supports both high performance and personal wellbeing. We believe in bringing people together face to face to enable us to work at pace, whilst recognizing the value of flexibility. Within that framework, we empower groups/teams to determine their own hybrid working patterns, depending on the work and the team’s needs. Details of what this means for each role will be shared upon application. In some cases, the flexibility we can offer is limited by local legal, regulatory, tax, or other considerations, and where this is the case, we will collaborate with you to find the best solution. Please talk to us to find out more about what this could look like for you.
Equal Opportunities at Arm
Arm is an equal opportunity employer, committed to providing an environment of mutual respect where equal opportunities are available to all applicants and colleagues. We are a diverse organization of dedicated and innovative individuals, and don’t discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.