انتهت صلاحية هذا الإعلان الوظيفي
انتهت بتاريخ ٤ أبريل ٢٠٢٦
Site Reliability Engineer
وصف الوظيفة
Our low-code platform is preparing for an immediate scale-up to 3,000,000 concurrent users. We currently operate on a GKE-based architecture with 78 microservices and a MongoDB Atlas backend. We need a Lead Site Reliability Engineer who can transform our current synchronous system into a high-concurrency, asynchronous engine capable of surviving massive traffic spikes without database or compute failure.
المسؤوليات
- Transition synchronous API flows to Google Cloud Pub/Sub
- Implement and own the 'Speed Limit' for the database
- Configure Subscriber-side Flow Control in Node.js and Kubernetes HPA
- Isolate heavy Puppeteer/Chrome workloads using Cloud Run or dedicated Spot VM node pools
- Build a 'Nerve Center' using Cloud Monitoring
- Optimize container footprints using Vertical Pod Autoscaling (VPA)
المؤهلات
- Deep experience with GKE, Pub/Sub, and Cloud Run
- Knowledge of how to request and manage high-scale CPU quotas
- Advanced Node.js knowledge
- Experience with MongoDB Atlas M60/M80 tiers
- Experience implementing Backpressure and Circuit Breakers