We are building a cloud-native compute orchestration platform supporting large-scale, compute-intensive engineering workloads across simulation, verification, and backend pipelines. The platform underpins engineering productivity and runs at scale, with resilience and efficiency at its core.
This role sits within the Productivity Engineering - Technical Capacity and Analytics team, a cloud-focused team enabling engineering users to migrate to and operate effectively in the cloud.
Overview:
We are seeking an experienced Senior Site Reliability Engineer (SRE) to strengthen availability, scalability, and operational maturity across the platform. This is an opportunity for someone passionate about Cloud, SRE, DevOps, and Software Engineering to make a meaningful impact across Arm.
You will design and operate distributed systems while advancing observability across AWS, Azure, and GCP. The role includes defining SLOs, evolving reliability standards, and using data-driven insights to continuously improve platform performance and stability.
Working closely with platform and engineering teams, you will embed sound operational practices into system design, improve deployment practices, and champion automation to reduce toil and improve incident response.
Responsibilities:
- Evolve observability across multi-cloud environments, enhancing monitoring, logging, alerting, tracing, and signal quality.
- Own performance and scalability validation through automated performance testing, ensuring systems operate reliably under both expected and peak demand.
- Improve deployment safety and release confidence through automated validation, reliability gates, and rollback controls in CI/CD.
- Validate resilience through structured failover testing to ensure graceful recovery.
- Lead incident response and root cause analysis, translating findings into preventative improvements.
- Define and uphold SLOs/SLAs while improving operational standards and documentation.
- Support security, audit, and compliance requirements from an operational perspective.
Required Skills and Experience:
- Experience in an SRE, DevOps, or Platform Engineering role operating production systems at scale.
- Hands-on experience with AWS, GCP, or Azure.
- Proficiency in Python, Go, Rust, or a comparable language.
- Experience defining and operating against SLIs/SLOs.
- Practical knowledge of observability across metrics, logs, tracing, and meaningful alerting.
- Experience supporting live environments using automation and data to reduce incident impact.
- Meaningful understanding of distributed systems and cloud-native architectures (e.g., Kubernetes, containers, serverless).
- Solid grasp of networking fundamentals, CI/CD practices, and modern deployment patterns.
- Familiarity with performance and load testing frameworks and profiling tools.
“Nice To Have”:
- Exposure to compute-intensive or high-throughput workloads.
- Experience with performance optimization, cloud cost efficiency, and capacity planning.
- Experience designing multi-cloud architectures.
- Familiarity with infrastructure as code (e.g., Terraform).
- Exposure to regulated or compliance-driven environments.
Salary Range:
$156,500-$211,700 per yearWe value people as individuals and our dedication is to reward people competitively and equitably for the work they do and the skills and experience they bring to Arm. Salary is only one component of Arm's offering. The total reward package will be shared with candidates during the recruitment and selection process.
Accommodations at Arm
At Arm, we want to build extraordinary teams. If you need an adjustment or an accommodation during the recruitment process, please email accommodations@arm.com. To note, by sending us the requested information, you consent to its use by Arm to arrange for appropriate accommodations. All accommodation or adjustment requests will be treated with confidentiality, and information concerning these requests will only be disclosed as necessary to provide the accommodation. Although this is not an exhaustive list, examples of support include breaks between interviews, having documents read aloud, or office accessibility. Please email us about anything we can do to accommodate you during the recruitment process.
Hybrid Working at Arm
Arm’s approach to hybrid working is designed to create a working environment that supports both high performance and personal wellbeing. We believe in bringing people together face to face to enable us to work at pace, whilst recognizing the value of flexibility. Within that framework, we empower groups/teams to determine their own hybrid working patterns, depending on the work and the team’s needs. Details of what this means for each role will be shared upon application. In some cases, the flexibility we can offer is limited by local legal, regulatory, tax, or other considerations, and where this is the case, we will collaborate with you to find the best solution. Please talk to us to find out more about what this could look like for you.
Equal Opportunities at Arm
Arm is an equal opportunity employer, committed to providing an environment of mutual respect where equal opportunities are available to all applicants and colleagues. We are a diverse organization of dedicated and innovative individuals, and don’t discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.