TL;DR

The bad

We hire deliberately and hold a high bar
No perks theatre — no merch, no retreats
Demanding workload
English or Spanish required
We treat you like an adult, not a child

The good

Applied AI | GPU systems | custom inference | multimodal pipelines
Fully remote
Above-market compensation
Clear path to senior technical leadership
Significant performance-based cash bonuses
Rolling interviews

Who we are

Rkive is an AI lab focused on multimodal reasoning across time and complexity.
We develop novel architectures and products. We aim to create intelligent environments that work alongside you — proactively, reliably, and responsive to intent.

Meaningful work: What we are building is genuinely unprecedented. The problems are hard and the opportunity is enormous.
Autonomy: Fully remote. Manage your own time. Take time off when you need it.
Zero politics: No bureaucracy, no posturing, no performative culture. Just the work.
Mutual respect: We back our people, but we expect the same in return.
Honest environment: Not a family, not a pressure cooker. A high-trust, high-performance team.

The role

You will own the path from research prototype to production system.
We have a PhD researcher designing temporal event representations and novel multimodal architectures. We have a systems engineer owning runtime infrastructure and GPU orchestration. You sit between them — taking research outputs and building them into reliable, high-performance production services. The founder sets technical direction, makes architectural decisions, and is hands-on in the codebase. You are joining a founder-led engineering team, not an autonomous department — expect direct collaboration, high standards, and no buffer layers.

This is not standard MLOps. Our stack includes custom GPU-accelerated rendering engines, proprietary multimodal fusion layers, structured output schemas, and a unified model interface that enforces standardised contracts across all model interactions. You will work directly with this infrastructure — extending it, optimising it, and shipping research into it.

Custom systems: You will work with proprietary pipelines, not just off-the-shelf model serving frameworks.
GPU-level work: CUDA kernels, NVENC, hardware-accelerated encoding and inference — not just container orchestration.
Research integration: Take novel architectures from the research scientist and make them run reliably at production latency and cost targets.
End-to-end ownership: From training pipeline to inference endpoint to monitoring — you own the full lifecycle.

What you will do

Build and ship AI systems that do not exist anywhere else.
The interesting problems here do not have Stack Overflow answers.

Model productionisation: Take research prototypes — temporal event models, multimodal fusion architectures — and build them into robust, low-latency production services.
Custom inference pipelines: Design and optimise inference paths that integrate with our rendering engine, structured output schema, and model interface. This is not wrapping an API.
GPU optimisation: Profile, optimise, and where necessary write custom CUDA code for training and inference workloads. Quantisation, kernel fusion, memory management, hardware-specific tuning.
Training infrastructure: Build and maintain distributed training pipelines for research models, including data ingestion from production signal.
Benchmarking and evaluation: Instrument systems for rigorous performance measurement — latency, throughput, cost, output quality — across model variants running through the same execution environment.
Cross-team integration: Work directly with the research scientist on architecture constraints and with the systems engineer on deployment and runtime requirements.

How you will do it

Engineering rigour applied to novel problems.
Standard tooling where it works. Custom solutions where it does not.

First principles: When existing tools do not solve the problem, build the tool. When the documentation does not exist, write it.
Production-aware research: Understand the deployment target before writing the training loop. Latency, memory, and cost constraints are design inputs, not afterthoughts.
Closed-loop iteration: Use production metrics — not just offline benchmarks — to evaluate and improve systems.
Collaborative: Work directly with the founder, research scientist, and systems engineer. No handoffs over the wall.
Core tools: Python, PyTorch (JAX a plus), CUDA/C++ where needed, Hugging Face ecosystem, TensorRT/ONNX Runtime, Docker, GPU profiling tools (Nsight, nvprof).

Who you are

We need someone who can go deep — not just wide.
You have shipped ML systems to production, but you have also gone below the abstraction layer when you had to.

Systems-level ML engineer: You understand what happens between model.forward() and the GPU executing a kernel. You have optimised at that level.
GPU-literate: You have worked with CUDA, or GPU-accelerated workloads at a level beyond just calling library functions. If you have written custom kernels, even better.
Builder: You have shipped at least one non-trivial ML system to production — not a demo, not a notebook, a real system handling real traffic.
Resourceful: When the problem is novel, you research, prototype, and solve. You do not wait for someone to tell you how.
Quality-driven: You write code that others can maintain. Tests, documentation, clear interfaces.
Research-adjacent: You can read a paper, understand the architecture, and implement it. You do not need the research scientist to hand you a script.
Ambitious: You want to build systems that do not exist yet, not maintain systems that already do.

When

Join within the next 90 days. Stay for the long term.
We are building for years, not quarters.

Rolling interviews: We interview and hire as applications arrive. First come, first served.
Start date: Between April 1st and June 30th, 2026.
Bonus: Performance-based cash bonuses.

What to send

Show us your work and your depth.
A strong CV gets you a look. Evidence of solving hard systems problems gets you an interview.

CV: Focused on impact and technical depth — not tool lists.
Portfolio: Repositories, deployed systems, performance benchmarks, GPU profiling results, custom pipelines — anything that shows you build real things.
Recommendations: From engineers, researchers, or technical leads you have worked with.
Cover letter: Tell us about the hardest production ML problem you have solved — and how.

Apply

If this is the kind of engineering you want to do, we want to hear from you.
Send your resume, portfolio, and GitHub links to careers@rkiveai.com

Applied AI Engineer