Apply for this job now

Lead Site Reliability Engineer (CDSS Advanced URL Filtering)

Location
Santa Clara, California
Posted
7 Jan 2025

Our Mission

At Palo Alto Networks®, our mission is clear: to be the cybersecurity partner of choice, safeguarding our digital way of life. Our vision is a world where every day is safer and more secure than the last. We're committed to challenging conventions and seeking innovators who are eager to help shape the future of cybersecurity.

Who We Are

We take our mission seriously, relentlessly defending our customers' interests. Every team member's unique ideas contribute to our shared success. Our values, developed collaboratively by employees, drive our daily actions through disruptive innovation, collaboration, and integrity. As part of our team, you’ll have a chance to help define the future of cybersecurity while receiving support for your development and wellbeing through various programs, including FLEXBenefits and mental health resources.

We value collaboration and in-person interactions, resulting in an environment where casual conversations enhance problem-solving and relationships. Join us in creating an atmosphere where everyone wins.

Your Career

Palo Alto Networks utilizes a vast hybrid infrastructure and stands as one of the largest customers of Google Cloud Platform (GCP). As a Site Reliability Engineer on the CDSS Advanced URL Filtering team, you'll be pivotal in shaping the reliability and scalability of our systems. This opportunity allows you to engage with cutting-edge technologies and take on complex challenges to contribute to solutions that protect our customers.

This role is based at our dynamic headquarters in Santa Clara, California. This position does not allow for remote work.

Your Impact

  • Optimize infrastructure costs by monitoring resource utilization, rightsizing instances, and minimizing waste.
  • Define and manage service-level objectives (SLOs) and metrics to secure service reliability and ensure alignment with business goals.
  • Design and uphold secure cloud infrastructure focusing on reliability, scalability, and efficiency.
  • Develop expertise in new technologies to advance infrastructure and operations.
  • Collaborate with cross-functional teams to ensure production-ready, highly available applications.
  • Automate deployments, monitoring, and alerting to streamline operations and boost reliability.
  • Diagnose and address critical issues, driving continuous improvement.
  • Participate in on-call rotations for smooth service operations.
  • Contribute to design reviews to enhance system performance and scalability.

Qualifications

Your Experience

  • Demonstrated creativity and collaboration with strong communication skills and a drive to make a meaningful impact.
  • Cloud and Infrastructure: Proficient in managing cloud infrastructure in public or private clouds (GCP, AWS, or Azure preferred) and skilled with tools such as Kubernetes, Terraform, and Ansible.
  • Database Operation: Experience in SQL and NoSQL database management and optimization, including tasks like provisioning, scaling, and troubleshooting (preferred platforms include BigQuery, MongoDB, Cloud SQL, Firestore, Bigtable, and MySQL).
  • System Reliability: Comprehensive understanding of distributed systems and high-availability strategies for optimizing performance.
  • Service-Level Management: Proven ability to define SLAs, SLOs, and SLIs to ensure reliability and business alignment.
  • Cost Optimization: Expertise in monitoring cloud costs, resource allocation, and implementing efficient practices.
  • Load Balancing and Networking: Hands-on experience with load balancing technologies such as Envoy, along with strong Linux system administration skills.
  • Automation and Development: Advanced programming skills, particularly in Python, Golang, or shell scripting for enhancing system reliability.
  • Production Deployment: Proven record of managing production deployments, ensuring stability, and upholding DevOps best practices.
  • Monitoring and CI/CD: Familiar with CI/CD practices (preferably GitLab CI) and designing robust monitoring and alerting systems.
  • Collaboration and Communication: Exceptional capability to work with diverse teams and provide technical leadership.
  • Mindset and Motivation: Self-disciplined and self-motivated with a strong sense of ownership and urgency.
  • Education: BS/MS in Computer Science, Computer Engineering, or a related field, with 8+ years of hands-on experience in Site Reliability Engineering or similar roles managing complex systems at scale.

Additional Information

The Team

Our engineering team fuels our mission of preventing cyberattacks through innovative problem-solving. We continuously explore new ideas and challenge industry norms, seeking individuals who thrive in ambiguity and are excited by challenges.

Compensation Disclosure

The starting base salary for this position ranges from $147,000 to $210,000 annually, depending on qualifications and experience. This role may also offer additional benefits, including stock options and bonuses.

Our Commitment

We are committed to fostering diverse teams that drive innovation. We provide reasonable accommodations for individuals with disabilities. If you need assistance due to a disability or special need, please reach out to us.

Palo Alto Networks is an equal opportunity employer, celebrating workplace diversity while considering applicants without regard to legally protected characteristics. All information will be kept confidential as per EEO guidelines.

Apply for this job now

Details

  • Job Reference: 1542351555-2
  • Date Posted: 7 January 2025
  • Recruiter: Palo Alto Networks
  • Location: Santa Clara, California
  • Salary: On Application