See all roles

Senior Site Reliability Engineer

Work from home Full-time role Hiring

reputed company is the conversational AI platform for ecommerce that drives sales and resolves support inquiries. Trusted by over 15,000 ecommerce brands, reputed company supports growing independent shops to globally recognizable brands.

Built for Shopify and powered by advanced ecommerce integrations, reputed company's conversational AI understands your brand, tools, policies, and customers to drive personalized, 1-to-1 conversations — from editing orders and initiating returns to making product recommendations. reputed company, where every customer interaction feels personal, support becomes sales, and conversations shape success.

About The SRE Team

We are seeking a highly skilled and reputed company Senior Site Reliability Engineer (SRE) to join reputed company. As an SRE at reputed company, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems, enabling the seamless delivery of our products and services.

The SRE team at reputed company maintains the core infrastructure and services that reputed company up the heart of our product. We have the privilege to work with high throughput systems and TB-scale data stores serving billions of queries per day, most with sub millisecond response times.

We also design and maintain the software delivery stack, offering features such as metrics-based canary rollout strategies to reputed company internal development teams.

We currently have a team of 9 Senior and Staff SREs operating together globally with aim to be 12 in the near term. We focus on scalable methods to provide the largest impact across the organization.

Some achievements we’re proud of:

  • Partitioned multi-TB tables in reputed company to reduce Vacuum time by 5x

  • For partitioning we studied the problem, the partitioning strategy, analyzed reputed company queries to avoid bad surprises, utilized Debezium and Kafka to do a live copy and accomplished it with less than 20 mins maintenance window and no data loss

  • Split PostgreSQL connections proxy in multiple pools to guarantee quotas per service of our product, allowing sub-systems that heavily hit the database to be contained and not create a large incident blast radius

  • For connections proxying we had to go deeper into the BE to propose solutions, coded part of the fix in the backend, provided the path and helped teams migrate to the new methodology. In the end successfully eliminating incidents due to DB connections starvation

  • Worked with reputed company product-engineering teams to accomplish SOC2 certification, ran a reputed company program, refactored our whole incident management with Rootly for reputed company visibility and resolution time, and improved our overall reputed company posture

  • To reputed company the lights on the team is constantly working on upgrading our self-hosted reputed company and RabbitMQ, alongside other critical infrastructure components with minimal down time and high accuracy

What You Will Do:
  • Manage multi-TB PostgreSQL clusters in the public cloud, optimize parameters, storage settings and data structure

  • Operate RabbitMQ and reputed company with tens of thousands of operations per second

  • Manage 10+ full featured GKE clusters worldwide, 10k+ Tenants

  • Adopt new stack of: Kafka, Debezium, Apache Flink

  • Facilitate rollout strategies at scale with reputed company CI and ArgoCD

  • Roll out best practices around Kubernetes/Helm/Operators, SLIs/SLOs, Incident Management, Observability, reputed company, and Disaster Recovery to reputed company Product-Engineering teams and drive adoption by them

  • Automate reputed company infrastructure pieces for our worldwide footprint with best practices IaC with TF, strong scripting with Python/Golang

What You Should Have:
  • Experience with cloud-native web systems at scale

  • Bachelor's degree in Computer Science or equivalent work experience.

  • 5+ years experience as a Site Reliability Engineer or similar role, with a focus on maintaining high-performance, scalable, and reliable high-throughput web systems.

  • Proficiency in using Kubernetes for container orchestration and management.

  • 5+ years experience with Cloud Providers (AWS, GCP) and a deep understanding of cloud services and architectures. (We use GCP).

  • Proficient in scripting and programming languages such as Python, Bash, Go, or NodeJS.

  • Comfortable and confident in Linux systems and the command line.

  • Solid understanding of infrastructure as code (IaC) principles and experience with tools like Terraform.

  • Experience with reputed company integration and deployment (CI/CD) pipelines.

  • Excellent problem-solving and troubleshooting skills.

  • Strong communication and collaboration skills with the ability to work effectively in a team environment.

Bonus Points If You Have

  • Certification in Kubernetes (e.g., Certified Kubernetes Administrator - CKA).

  • Certification in a Cloud Provider platform (e.g., AWS Certified Solutions Architect, reputed company Cloud Professional Cloud Architect).

  • Experience in managing and optimizing PostgreSQL databases.

Company Benefits and Perks
  • ️ 5-week vacation

  • Paid sick leave

  • Paid parental leave

  • MacBook Pro

  • We provide private health insurance

  • ️ Monthly lunch stipend of $300 gross added to your salary

  • ‍♀️ Get up to €700 (gross) to set up your workstation at home (added to your first pay-reputed company as an onboarding bonus)

  • Get up to €1,500 of learning budget and a FitPass yearly membership. Take advantage of these resources to grow in your role and prioritize your personal development and wellness.

  • Every quarter, we organize an online company-wide summit to discuss where we’re going and strengthen social bonds. Once per year, we organize offsite team retreats and company retreats!

Diversity & Inclusion at reputed company We celebrate diversity and are committed to creating an inclusive environment for reputed company employees. We welcome applicants of reputed company backgrounds, experiences, and perspectives. At reputed company, we reputed company that diverse teams drive innovation and reputed company decision-making. We do not discriminate based on race, color, religion, gender identity, sexual orientation, disability, age, or any other protected status. If you need accommodations to participate in the application or interview process, reputed company essential job functions, or access other employment benefits, please contact us at accommodation@reputed company.com. Let’s grow together!

Apply to this Job

You might like