Pursue your passion and potential
Senior Site Reliability Engineer
Eden Prairie, Minnesota
Caring. Connecting. Growing together.
With these values to guide us, our people are committed to making a meaningful difference in the lives of those we are honored to serve.
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, and data they need to feel their best. Here, you will find a culture guided by diversity and inclusion, talented peers, comprehensive benefits, and career development opportunities. Come make an impact on the communities we serve as you help us advance health equity on a global scale. Join us to start Caring. Connecting. Growing together.
The Site Reliability Engineering (SRE) team at Optum Financial ensures world-class reliability, scalability, security, compliance, and performance of a scalable infrastructure platform that powers diverse financial products. We exist so our customers, partners, and engineers can trust and innovate financial products without fear and with velocity. As a Senior SRE, you will lead our mission to own the tools, platforms, and processes that enable success. Our team is driving modern observability practices with OpenTelemetry and the adoption of SLOs as reliability measures. You will be instrumental in automating our environment and building AI-enhanced platforms to support the next generation of financial technology.
You will enjoy the flexibility to telecommute* from anywhere within the U.S. as you take on some tough challenges.
Primary Responsibilities:
- Design, develop, and deploy AI-powered solutions to address complex infrastructure and reliability challenges with an emphasis on the responsible use of AI
- Implement and support observability and monitoring solutions using tools such as OpenTelemetry, Datadog, Splunk, and Dynatrace to improve system visibility and reliability
- Define, implement, and maintain service level indicators (SLIs), service level objectives (SLOs), and actionable alerting strategies in partnership with engineering teams
- Use and evaluate enterprise-approved AI tools to streamline workflows, automate tasks, and drive continuous improvement across the platform
- Develop and maintain automation to improve operational efficiency, including alerting, incident analysis, and recovery workflows
- Support incident response processes, including troubleshooting, root cause analysis (RCA), and implementation of corrective actions to prevent recurrence
- Support cloud-based infrastructure (Azure or AWS) and containerized environments (Kubernetes, Docker) to enhance scalability, stability, and efficiency
- Evaluate emerging technology trends to inform solution design and strategic innovation for the SRE platform
- Contribute to the development of SRE platform capabilities, including self-healing systems and automated operational processes
- Partner with cross-functional teams to promote adoption of SRE best practices and improve overall system reliability
You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear directions on what it takes to succeed in your role as well as provide development for other roles you may be interested in.
Required Qualifications:
- 5+ years of experience in software engineering, DevOps, or Site Reliability Engineering (SRE) roles
- 2+ years of experience implementing and supporting observability and monitoring tools (e.g., OpenTelemetry, Datadog, Splunk, Dynatrace)
- 2+ years of experience defining and maintaining SLIs, SLOs, and production alerting strategies.
- 2+ years of experience working in cloud environments (Azure or AWS)
- 1+ years of experience supporting containerized applications (e.g., Kubernetes, Docker)
Preferred Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field
- 2+ years of experience with CI/CD tools (e.g., Jenkins, GitHub Actions, ArgoCD)
- 1+ years of experience with infrastructure as code tools (e.g., Terraform, Pulumi)
- 1+ years of experience participating in incident response and root cause analysis (RCA) processes
- Direct experience developing automation for operational workflows or reliability engineering tasks
- Exposure to AI/ML concepts or practical experience applying automation to improve operational efficiency
*All Telecommuters will be required to adhere to UnitedHealth Group's Telecommuter Policy.
Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you'll find a far-reaching choice of benefits and incentives. The salary for this role will range from $91,700 - $163,700 annually based on full-time employment. We comply with all minimum wage laws as applicable.
Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Application Deadline: This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. Job posting may come down early due to volume of applicants.
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location, and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups, and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.
UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations.
UnitedHealth Group is a drug-free workplace. Candidates are required to pass a drug test before beginning employment.
#RPO #GREEN
Benefits
Our mission of helping people live healthier lives extends to our team members. Learn more about our range of benefits designed to help you live well.
Life
Resources and support to focus on what matters most to you, in every facet of your life.
Emotional
Education, tools and resources to help you reduce and manage stress, build resilience and more.
Physical
Health plans and other coverage to support wellness for you and your loved ones.
Financial
Benefits for today and to help you plan for the future, including your retirement.
We’re honored to be recognized for our exceptional work culture
Connect with us


