Site Reliability Engineer

Posted by LockedIn AI

Full-Time $145k – $200k 📍 Remote

Engineering & Technology

LockedIn AI is the #1 real-time AI interview and meeting copilot, trusted by over 1 million users worldwide. We are building the most advanced AI-powered career preparation platform that helps users succeed in interviews, coding assessments, and professional communication.

Our platform delivers real-time AI assistance during live conversations — where reliability, speed, and uptime are mission-critical.

Role Overview

We are looking for a proactive, systems-minded Site Reliability Engineer (SRE) to ensure that LockedIn AI’s production systems are highly reliable, scalable, and performant.

This is a high-impact engineering role where system stability directly defines user experience. When users are in live interviews, latency and uptime are the product.

You will own the reliability of real-time AI infrastructure serving over 1 million users globally.

Key Responsibilities

1. System Reliability & Performance

Own uptime, reliability, and performance across production systems
Define and manage SLIs, SLOs, and error budgets
Build fault-tolerant and self-healing architectures
Optimize latency, throughput, and system efficiency

2. Infrastructure as Code & Cloud Systems

Design and manage cloud infrastructure using Terraform, Pulumi, or CloudFormation
Operate AWS, GCP, or Azure-based production environments
Manage Kubernetes clusters and microservices infrastructure
Optimize cloud costs while maintaining performance and reliability

3. Observability & Monitoring

Build monitoring systems using Prometheus, Grafana, Datadog, or similar tools
Design alerting systems with low noise and high accuracy
Implement distributed tracing and centralized logging
Monitor AI-specific metrics (latency, GPU usage, inference throughput)

4. Incident Response & Reliability Engineering

Lead incident response for outages and production issues
Participate in on-call rotations
Conduct postmortems and root cause analysis
Build runbooks and improve system resilience over time

5. CI/CD & Release Engineering

Build and maintain CI/CD pipelines for fast and safe deployments
Implement canary, blue-green, and rollback strategies
Ensure safe deployment of application and AI model updates
Improve deployment velocity without compromising stability

6. Security & Infrastructure Best Practices

Implement secure infrastructure design (IAM, encryption, secrets management)
Maintain compliance with privacy and security standards
Manage vulnerability scanning and system hardening
Ensure secure handling of user data across systems

Required Qualifications

Experience

3+ years in SRE, DevOps, or infrastructure engineering
Experience managing production systems at scale
Strong background in incident response and system reliability
Experience working in fast-paced startup environments

Education

Bachelor’s degree in Computer Science, Engineering, or related field
Equivalent hands-on experience strongly considered

Technical Skills

Strong programming skills (Python, Go, or similar)
Experience with AWS, GCP, or Azure
Kubernetes and Docker expertise
Infrastructure as Code (Terraform, Pulumi, CloudFormation)
CI/CD systems (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
Observability tools (Prometheus, Grafana, Datadog, ELK, etc.)

Soft Skills

Strong reliability-first engineering mindset
Calm and effective under production incidents
Excellent communication and documentation skills
Strong ownership and proactive problem-solving

Preferred Qualifications

Experience with real-time AI or ML infrastructure
Knowledge of GPU-based or inference-heavy systems
Experience with streaming, WebSockets, or low-latency systems
Familiarity with chaos engineering practices
Multi-cloud or hybrid-cloud experience
Experience in SaaS, edtech, or AI startups
Open-source infrastructure contributions

What We Offer

Equity

Meaningful early-stage ownership in a fast-growing AI company

Impact

Your work directly supports over 1 million active users

Team

Join a lean, high-performance engineering team

Flexibility

Remote-first with optional hybrid work in New York

Growth

Fast-paced startup environment with high ownership

Culture

User-focused, feedback-driven, and execution-oriented

Why Join LockedIn AI?

Category-defining AI interview copilot platform
Massive and fast-growing AI career tech market
Reliability directly impacts real-time user experience
Work on cutting-edge AI infrastructure at scale
High ownership and real production responsibility

How to Apply

Please submit:

Resume / CV
Short note covering:
- Why you want to join LockedIn AI
- Whether you’ve used the product
- Ideas for improving reliability or performance
Optional: GitHub, projects, or technical writing

Equal Opportunity Statement

LockedIn AI is committed to building a diverse and inclusive team. We welcome applicants from all backgrounds. Hiring decisions are based on merit, skills, and business needs.

Apply for this job ↗

Job Details

Type Full-Time

Salary $145k – $200k

Location Remote

Posted May 24, 2026

About the Employer

LockedIn AI

LockedIn AI™ is an AI interview assistant that listens to your interview, analyzes questions, and provides real-time answers, code solutions, and live coaching automatically.

View All Jobs at LockedIn AI → Visit Website ↗

ALL EVENTS

SELECT A CITY