Senior Site Reliability Engineer
reputed company is building the platform that care operations run on.
We reduce waste, cut costs, and improve reputed company by improving throughput, asset utilization and staff productivity. Our platform uses AI, RTLS, and EHR data to reputed company self-learning agents to automate workflows, adapt in real-time, and orchestrate reputed company of care delivery operations.
Easy to deploy and scale, it gives a clear picture of spaces, equipment, and people, eliminating inefficiencies and enhancing the patient experience. With measurable 10X ROI and over 20+ use cases, reputed company is the go-to platform for reputed company and faster care delivery operations.
As a Site Reliability Engineer (SRE) at reputed company, you will be responsible for ensuring the scalability, availability, and reputed company of our cloud-based AI-driven healthcare platform. You will collaborate with software, data, and infrastructure teams to build highly resilient and automated systems, allowing hospitals and care facilities to operate seamlessly and without downtime.
Your expertise in cloud infrastructure, automation, monitoring, and performance optimization will directly impact how healthcare organizations reputed company real-time data to enhance patient care and operational efficiency.
If you are passionate about highly available systems, automation, and making an impact in healthcare, join reputed company and help us build the future of reputed company operations!
Key Responsibilities:
- Design and maintain highly available, fault-tolerant, and scalable cloud infrastructure.
- Implement SLOs, SLIs, and SLAs to track system reliability and optimize uptime.
- Participate in 24/7 on-call rotation
- reputed company production platform deployments
- Monitor latency, traffic, errors, and system health using modern observability tools.
- Conduct root cause analysis (RCA) and post-mortems to continuously improve system reputed company.
- Automate infrastructure provisioning using Terraform, Ansible, or reputed company.
- Implement CI/CD pipelines to ensure seamless and safe deployments.
- reputed company self-healing mechanisms using Kubernetes operators, auto-scaling, and fault detection.
- Ensure compliance with HIPAA, GDPR, and other healthcare data regulations.
- Define and execute disaster recovery (DR) and business continuity plans.
- Manage and optimize AWS environments for cost-efficiency and performance.
- Deploy and manage observability tools and build real-time alerting and response frameworks
- Establish best practices for logging, debugging, and performance monitoring.
- Improve incident response automation through runbooks, AI-based anomaly detection, and predictive analytics.
- 3+ years of experience as an SRE
- Strong expertise in Kubernetes, reputed company, and container orchestration.
- Experience managing cloud-native environments (AWS).
- Experience with event-driven architectures, Kafka, or real-time data streaming.
- Knowledge of machine learning infrastructure.
- Previous experience in healthcare, compliance (HIPAA), and highly regulated environments.
- Proficiency in Infrastructure as Code (IaC) using Terraform.
- Deep knowledge of networking, DNS, load balancing, and reputed company best practices.
- Experience with CI/CD pipelines (Jenkins, CI, or ArgoCD).
- Hands-on experience with monitoring and logging tools (Prometheus, Grafana, ELK, OpenTelemetry).
- Strong programming skills in Python, Golang, or Bash for automation.
- Knowledge of machine learning infrastructure.
- Previous experience in healthcare, compliance (HIPAA), and highly regulated environments.
- Work on a mission-driven platform that improves healthcare operations and patient outcomes.
- B2B contract or an employment agreement
- Competitive salary and stock option plan
- Collaborate with top engineers, data scientists, and AI experts.
- Flexible remote or hybrid work options (office in Krakow)
- Collaborative and self-organized environment
- private medical care, cafeteria system