About the role
At Google Canada in Waterloo, we are seeking a highly experienced Staff Site Reliability Developer to join our Protected Data SRE team. This pivotal role involves driving the overarching strategy to significantly reduce ecosystem-wide complexity, with a keen focus on solution and component reuse to proactively mitigate new production risks. You will partner closely with executive developing stakeholders and cross-functional programmes, balancing critical product reliability requirements against stringent regulatory deadlines. Your technical expertise will be crucial in designing company-wide capabilities for enhanced change safety, robust distributed observability, large-scale data repair, and comprehensive control plane safety across our sophisticated systems.
Details
As a Staff Site Reliability Developer, you will serve as a vital technical anchor, providing essential direction and mentorship to developers right here in Waterloo. This role fosters a collaborative culture, encouraging seamless work across diverse infrastructure stacks, from Spanner to the Google Front End (GFE). Our Site Reliability Engineering (SRE) philosophy masterfully combines software and systems engineering principles to construct and operate massively distributed, fault-tolerant systems at an unparalleled scale. The SRE team ensures that Google Cloud's services, encompassing both internally critical and externally visible systems, consistently maintain the reliability and uptime essential for our customers' needs, while also driving a rapid rate of improvement. This includes an ever-watchful eye on our systems' capacity and performance, leveraging extensive experience in this domain.
Much of our software development effort is dedicated to optimising existing systems, building resilient infrastructure, and eliminating manual work through advanced automation. Joining the SRE team offers the unique opportunity to manage the complex challenges of scale that are distinctive to Google Cloud, applying your profound expertise in coding, algorithms, complexity analysis, and large-scale system design. The SRE culture thrives on intellectual curiosity, innovative problem-solving, and openness, bringing together individuals with a wide variety of backgrounds, experiences, and perspectives. We strongly encourage collaboration, bold thinking, and calculated risk-taking within a supportive, blame-free environment. You will find ample opportunity for self-direction on meaningful projects, complemented by robust support and mentorship to facilitate continuous learning and professional growth within a highly technical and collaborative site.