TL;DR

The bad

We hire deliberately and hold a high bar
No perks theatre — no merch, no retreats
Demanding workload
English or Spanish required
We treat you like an adult, not a child

The good

GPU systems | CUDA | custom runtimes | low-level infrastructure | media pipelines
Fully remote
Above-market compensation
Role can scale to Head of Systems / Infrastructure
Significant performance-based cash bonuses
Rolling interviews

Who we are

Rkive is an AI lab focused on multimodal reasoning across time and complexity.
We develop novel architectures and products. We aim to create intelligent environments that work alongside you — proactively, reliably, and responsive to intent.

Meaningful work: What we are building is genuinely unprecedented. The problems are hard and the opportunity is enormous.
Autonomy: Fully remote. Manage your own time. Take time off when you need it.
Zero politics: No bureaucracy, no posturing, no performative culture. Just the work.
Mutual respect: We back our people, but we expect the same in return.
Honest environment: Not a family, not a pressure cooker. A high-trust, high-performance team.

The role

You will own the systems layer — runtime, GPU infrastructure, and the low-level engineering that everything else depends on.
We have a PhD researcher designing novel architectures and an applied AI engineer productionising them. You are the foundation they both deploy onto. This is not a DevOps role where you manage CI pipelines and cloud dashboards. Our production stack includes custom GPU-accelerated rendering engines, hardware-level media encoding, proprietary processing pipelines, and self-healing GPU orchestration — built in-house, not assembled from managed services. The founder built much of this infrastructure directly — you are taking ownership of systems that were designed and shipped by the CEO. The bar is set by what already exists.

Custom runtimes: We run proprietary rendering engines on NVIDIA GPUs using CUDA and NVENC. You will maintain, extend, and scale these systems.
GPU orchestration: Autoscaling, auto-recovery, and deployment of GPU workloads that are not standard container deployments. The workloads are custom and the orchestration must be too.
Low-level systems work: Memory management, hardware profiling, driver-level debugging, kernel-level performance tuning. The problems you solve live below the application layer.
Reliability engineering: These systems must self-heal, scale under variable load, and maintain deterministic output quality. You own that guarantee.

What you will do

Keep the engine running — and make it faster.
The systems you manage are not off-the-shelf. The engineering required to operate them is not either.

GPU runtime operations: Deploy, monitor, and scale custom GPU workloads — rendering engines, inference services, media encoding pipelines — across cloud GPU infrastructure.
Systems-level debugging: Profile and resolve issues at the hardware, driver, kernel, and application level. Memory leaks, GPU utilisation bottlenecks, encoding failures, race conditions in concurrent pipelines.
Custom orchestration: Build and maintain deployment, autoscaling, and recovery systems for workloads that do not fit standard container orchestration patterns.
Infrastructure as code: Manage cloud infrastructure programmatically — but the interesting work is in the custom layers above the cloud primitives.
Performance engineering: Continuously profile and optimise system performance — latency, throughput, GPU utilisation, cost per operation.
Security and hardening: Own infrastructure security, access controls, and audit posture. This is a baseline expectation, not the focus of the role.
Cross-team support: Provide the runtime foundation for the applied AI engineer shipping model pipelines and the research scientist running training workloads.

How you will do it

Build what does not exist. Fix what no one else can.
When the documentation does not cover it, you figure it out.

Low-level fluency: C/C++, CUDA, Linux systems programming. You are comfortable reading driver logs and GPU memory traces.
Scripting and automation: Python, Bash — for orchestration, monitoring, and operational tooling.
Cloud infrastructure: AWS (EC2, ECS, GPU instances, networking, IAM, S3). You know the primitives, but your value is in what you build on top of them.
Containerisation and beyond: Docker, but also bare-metal GPU deployment where containers add overhead or constraints.
Monitoring and observability: Build dashboards and alerting for custom GPU workloads — not just CPU and memory metrics.
Collaborative: Work directly with the founder, applied AI engineer, and research scientist. In a team this size, the systems engineer is not a support function — you are a core builder.

Who you are

We need someone who operates at the level where software meets hardware.
This role is for engineers who have gone deeper than cloud consoles and YAML files.

Systems engineer, not a DevOps generalist: You have worked with GPU infrastructure, custom runtimes, or low-level systems — not just managed cloud services and CI/CD pipelines.
GPU-literate: You understand GPU architecture, memory models, and hardware-accelerated workloads. Experience with CUDA, NVENC, or similar hardware interfaces is strongly preferred.
Debugger: You can diagnose problems that span the full stack — from application code down to driver behaviour and hardware state.
Builder: When existing tools do not solve the problem, you build the tool. When the orchestration pattern does not exist, you design it.
Reliable: You build systems that run without you watching them. Self-healing, self-scaling, deterministic.
Resourceful: You learn what you do not know. You do not stop at the boundary of your current expertise.
Ambitious: You see infrastructure as a first-class engineering discipline, not a support function. You want to build systems that do not exist anywhere else.

When

Join within the next 90 days. Stay for the long term.
We are building for years, not quarters.

Rolling interviews: We interview and hire as applications arrive. First come, first served.
Start date: Between April 1st and June 30th, 2026.
Bonus: Performance-based cash bonuses.

What to send

Show us your depth, not your tool list.
We care about what you have built and what you have debugged — not which certifications you hold.

CV: Focused on systems-level impact — custom infrastructure, GPU work, low-level engineering.
Portfolio: Repositories, architecture documents, performance analyses, custom tooling — anything that shows you build at the systems level.
Recommendations: From engineers, CTOs, or technical leads who have seen your work up close.
Cover letter: Tell us about a system you built or fixed that was genuinely hard — where the answer was not in the documentation.

Apply

If this is the kind of systems work you want to do, we want to hear from you.
Send your resume and portfolio to careers@rkiveai.com

Systems Engineer