Site Reliability Engineer

Posted by LockedIn AI
Full-Time $145k – $200k 📍 Remote
Engineering & Technology

LockedIn AI is the #1 real-time AI interview and meeting copilot, trusted by over 1 million users worldwide. We are building the most advanced AI-powered career preparation platform that helps users succeed in interviews, coding assessments, and professional communication.

Our platform delivers real-time AI assistance during live conversations — where reliability, speed, and uptime are mission-critical.

Role Overview

We are looking for a proactive, systems-minded Site Reliability Engineer (SRE) to ensure that LockedIn AI’s production systems are highly reliable, scalable, and performant.

This is a high-impact engineering role where system stability directly defines user experience. When users are in live interviews, latency and uptime are the product.

You will own the reliability of real-time AI infrastructure serving over 1 million users globally.

Key Responsibilities

1. System Reliability & Performance

  • Own uptime, reliability, and performance across production systems
  • Define and manage SLIs, SLOs, and error budgets
  • Build fault-tolerant and self-healing architectures
  • Optimize latency, throughput, and system efficiency

2. Infrastructure as Code & Cloud Systems

  • Design and manage cloud infrastructure using Terraform, Pulumi, or CloudFormation
  • Operate AWS, GCP, or Azure-based production environments
  • Manage Kubernetes clusters and microservices infrastructure
  • Optimize cloud costs while maintaining performance and reliability

3. Observability & Monitoring

  • Build monitoring systems using Prometheus, Grafana, Datadog, or similar tools
  • Design alerting systems with low noise and high accuracy
  • Implement distributed tracing and centralized logging
  • Monitor AI-specific metrics (latency, GPU usage, inference throughput)

4. Incident Response & Reliability Engineering

  • Lead incident response for outages and production issues
  • Participate in on-call rotations
  • Conduct postmortems and root cause analysis
  • Build runbooks and improve system resilience over time

5. CI/CD & Release Engineering

  • Build and maintain CI/CD pipelines for fast and safe deployments
  • Implement canary, blue-green, and rollback strategies
  • Ensure safe deployment of application and AI model updates
  • Improve deployment velocity without compromising stability

6. Security & Infrastructure Best Practices

  • Implement secure infrastructure design (IAM, encryption, secrets management)
  • Maintain compliance with privacy and security standards
  • Manage vulnerability scanning and system hardening
  • Ensure secure handling of user data across systems

Required Qualifications

Experience

  • 3+ years in SRE, DevOps, or infrastructure engineering
  • Experience managing production systems at scale
  • Strong background in incident response and system reliability
  • Experience working in fast-paced startup environments

Education

  • Bachelor’s degree in Computer Science, Engineering, or related field
  • Equivalent hands-on experience strongly considered

Technical Skills

  • Strong programming skills (Python, Go, or similar)
  • Experience with AWS, GCP, or Azure
  • Kubernetes and Docker expertise
  • Infrastructure as Code (Terraform, Pulumi, CloudFormation)
  • CI/CD systems (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
  • Observability tools (Prometheus, Grafana, Datadog, ELK, etc.)

Soft Skills

  • Strong reliability-first engineering mindset
  • Calm and effective under production incidents
  • Excellent communication and documentation skills
  • Strong ownership and proactive problem-solving

Preferred Qualifications

  • Experience with real-time AI or ML infrastructure
  • Knowledge of GPU-based or inference-heavy systems
  • Experience with streaming, WebSockets, or low-latency systems
  • Familiarity with chaos engineering practices
  • Multi-cloud or hybrid-cloud experience
  • Experience in SaaS, edtech, or AI startups
  • Open-source infrastructure contributions

What We Offer

Equity

Meaningful early-stage ownership in a fast-growing AI company

Impact

Your work directly supports over 1 million active users

Team

Join a lean, high-performance engineering team

Flexibility

Remote-first with optional hybrid work in New York

Growth

Fast-paced startup environment with high ownership

Culture

User-focused, feedback-driven, and execution-oriented

Why Join LockedIn AI?

  • Category-defining AI interview copilot platform
  • Massive and fast-growing AI career tech market
  • Reliability directly impacts real-time user experience
  • Work on cutting-edge AI infrastructure at scale
  • High ownership and real production responsibility

How to Apply

Please submit:

  • Resume / CV
  • Short note covering:
    • Why you want to join LockedIn AI
    • Whether you’ve used the product
    • Ideas for improving reliability or performance
  • Optional: GitHub, projects, or technical writing

Equal Opportunity Statement

LockedIn AI is committed to building a diverse and inclusive team. We welcome applicants from all backgrounds. Hiring decisions are based on merit, skills, and business needs.

Apply for this job

Job Details

Type Full-Time
Salary $145k – $200k
Location Remote
Posted May 24, 2026

About the Employer

LockedIn AI
LockedIn AI™ is an AI interview assistant that listens to your interview, analyzes questions, and provides real-time answers, code solutions, and live coaching automatically.
View All Jobs at LockedIn AI → Visit Website ↗