Applied AI Engineer
Bridge research and production: build custom inference pipelines, optimise GPU workloads, and ship multimodal AI systems that run at scale.
- Fully Remote
- Competitive Compensation
- High Technical Bar
TL;DR
The bad
- We hire deliberately and hold a high bar
- No perks theatre — no merch, no retreats
- Demanding workload
- English or Spanish required
- We treat you like an adult, not a child
The good
- Applied AI | GPU systems | custom inference | multimodal pipelines
- Fully remote
- Above-market compensation
- Clear path to senior technical leadership
- Significant performance-based cash bonuses
- Rolling interviews
Who we are
Rkive is an AI lab focused on multimodal reasoning across time and complexity.
We develop novel architectures and products. We aim to create intelligent environments that work alongside you — proactively, reliably, and responsive to intent.
- Meaningful work: What we are building is genuinely unprecedented. The problems are hard and the opportunity is enormous.
- Autonomy: Fully remote. Manage your own time. Take time off when you need it.
- Zero politics: No bureaucracy, no posturing, no performative culture. Just the work.
- Mutual respect: We back our people, but we expect the same in return.
- Honest environment: Not a family, not a pressure cooker. A high-trust, high-performance team.
The role
You will own the path from research prototype to production system.
We have a PhD researcher designing temporal event representations and novel multimodal architectures. We have a systems engineer owning runtime infrastructure and GPU orchestration. You sit between them — taking research outputs and building them into reliable, high-performance production services. The founder sets technical direction, makes architectural decisions, and is hands-on in the codebase. You are joining a founder-led engineering team, not an autonomous department — expect direct collaboration, high standards, and no buffer layers.
This is not standard MLOps. Our stack includes custom GPU-accelerated rendering engines, proprietary multimodal fusion layers, structured output schemas, and a unified model interface that enforces standardised contracts across all model interactions. You will work directly with this infrastructure — extending it, optimising it, and shipping research into it.
- Custom systems: You will work with proprietary pipelines, not just off-the-shelf model serving frameworks.
- GPU-level work: CUDA kernels, NVENC, hardware-accelerated encoding and inference — not just container orchestration.
- Research integration: Take novel architectures from the research scientist and make them run reliably at production latency and cost targets.
- End-to-end ownership: From training pipeline to inference endpoint to monitoring — you own the full lifecycle.
What you will do
Build and ship AI systems that do not exist anywhere else.
The interesting problems here do not have Stack Overflow answers.
- Model productionisation: Take research prototypes — temporal event models, multimodal fusion architectures — and build them into robust, low-latency production services.
- Custom inference pipelines: Design and optimise inference paths that integrate with our rendering engine, structured output schema, and model interface. This is not wrapping an API.
- GPU optimisation: Profile, optimise, and where necessary write custom CUDA code for training and inference workloads. Quantisation, kernel fusion, memory management, hardware-specific tuning.
- Training infrastructure: Build and maintain distributed training pipelines for research models, including data ingestion from production signal.
- Benchmarking and evaluation: Instrument systems for rigorous performance measurement — latency, throughput, cost, output quality — across model variants running through the same execution environment.
- Cross-team integration: Work directly with the research scientist on architecture constraints and with the systems engineer on deployment and runtime requirements.
How you will do it
Engineering rigour applied to novel problems.
Standard tooling where it works. Custom solutions where it does not.
- First principles: When existing tools do not solve the problem, build the tool. When the documentation does not exist, write it.
- Production-aware research: Understand the deployment target before writing the training loop. Latency, memory, and cost constraints are design inputs, not afterthoughts.
- Closed-loop iteration: Use production metrics — not just offline benchmarks — to evaluate and improve systems.
- Collaborative: Work directly with the founder, research scientist, and systems engineer. No handoffs over the wall.
- Core tools: Python, PyTorch (JAX a plus), CUDA/C++ where needed, Hugging Face ecosystem, TensorRT/ONNX Runtime, Docker, GPU profiling tools (Nsight, nvprof).
Who you are
We need someone who can go deep — not just wide.
You have shipped ML systems to production, but you have also gone below the abstraction layer when you had to.
- Systems-level ML engineer: You understand what happens between
model.forward()and the GPU executing a kernel. You have optimised at that level. - GPU-literate: You have worked with CUDA, or GPU-accelerated workloads at a level beyond just calling library functions. If you have written custom kernels, even better.
- Builder: You have shipped at least one non-trivial ML system to production — not a demo, not a notebook, a real system handling real traffic.
- Resourceful: When the problem is novel, you research, prototype, and solve. You do not wait for someone to tell you how.
- Quality-driven: You write code that others can maintain. Tests, documentation, clear interfaces.
- Research-adjacent: You can read a paper, understand the architecture, and implement it. You do not need the research scientist to hand you a script.
- Ambitious: You want to build systems that do not exist yet, not maintain systems that already do.
When
Join within the next 90 days. Stay for the long term.
We are building for years, not quarters.
- Rolling interviews: We interview and hire as applications arrive. First come, first served.
- Start date: Between April 1st and June 30th, 2026.
- Bonus: Performance-based cash bonuses.
What to send
Show us your work and your depth.
A strong CV gets you a look. Evidence of solving hard systems problems gets you an interview.
- CV: Focused on impact and technical depth — not tool lists.
- Portfolio: Repositories, deployed systems, performance benchmarks, GPU profiling results, custom pipelines — anything that shows you build real things.
- Recommendations: From engineers, researchers, or technical leads you have worked with.
- Cover letter: Tell us about the hardest production ML problem you have solved — and how.
Apply
If this is the kind of engineering you want to do, we want to hear from you.
Send your resume, portfolio, and GitHub links to careers@rkiveai.com