Get latest jobs before others

AI Inference Engineer

Teton.ai Aps



  • Software Development
  • Full-time
  • Copenhagen, DK

AI Inference Engineer
About Us
At Teton, we are redefining the role of healthcare workers through cutting-edge AI technology. Facing a global nursing shortage, our solutions offer vital support to overburdened health systems. We distinguish ourselves by focusing relentlessly on product excellence and user experience, rapidly deploying solutions that make a real difference.

At this stage of our company we require a physical presence in our office in Copenhagen, Denmark. We believe this enables the fastest and most efficient iteration cycles to build an impactful product that users love to use.

The Job
We're looking for a highly specialized AI Inference Engineer who thrives on optimizing AI models for real-world deployment at scale. You'll be the technical force behind making our healthcare AI systems blazingly fast, efficient, and production-ready. This role demands deep technical expertise in model optimization, CUDA programming, and cutting-edge inference frameworks.

You will be responsible for:

  • Model Optimization & Quantization: Implementing advanced quantization techniques, pruning, and distillation to maximize inference speed while maintaining accuracy
  • CUDA & Low-Level Optimization: Writing and optimizing CUDA kernels, leveraging TensorRT, and pushing the boundaries of GPU utilization
  • DeepStream Integration: Building robust inference pipelines using NVIDIA DeepStream, Jetpack, and edge deployment frameworks
  • Transformer Optimization: Specializing in transformer model inference optimization, including attention mechanisms, KV-cache optimization, and memory management
  • Infrastructure Scaling: Designing and implementing scalable inference infrastructure that can handle healthcare's demanding real-time requirements
  • Performance Engineering: Profiling, benchmarking, and continuously improving model serving latency and throughput

What You Bring

  • Deep AI Optimization Expertise: 3+ years of hands-on experience optimizing deep learning models for production inference
  • CUDA Mastery: Strong proficiency in CUDA programming, kernel optimization, and GPU memory management
  • Inference Frameworks: Extensive experience with TensorRT, DeepStream, Triton Inference Server, or similar high-performance serving frameworks
  • Transformer Specialization: Deep understanding of transformer architectures and their optimization challenges (attention mechanisms, memory patterns, sequence handling)
  • Systems Programming: Proficiency in Python, C++, and PyTorch with a focus on performance-critical code
  • Edge Deployment: Experience with NVIDIA Jetpack, edge computing, and resource-constrained environments
  • Performance Mindset: Obsessed with benchmarking, profiling, and squeezing every ounce of performance from hardware

Bonus Points

  • Experience with custom CUDA kernel development
  • Knowledge of mixed-precision training and inference
  • Familiarity with distributed inference and model parallelism
  • Experience with healthcare or safety-critical AI applications
  • Contributions to open-source inference optimization projects

What We Offer

  • Participation in our warrant program (stock options)
  • Work with state-of-the-art AI optimization technology in a pioneering field
  • Access to cutting-edge hardware and compute resources
  • A vibrant, learning-focused work environment with fellow optimization enthusiasts
  • Direct impact on healthcare delivery through performance-critical AI systems

Join Our Team
We're looking for engineers who get excited about shaving milliseconds off inference time and making AI models run faster than anyone thought possible. If you're passionate about the intersection of AI, systems programming, and real-world impact, come help us transform healthcare through optimized AI inference.

Ready to push the boundaries of what's possible with AI optimization? Join us in Copenhagen and be part of our mission to revolutionize healthcare.

  • Tell a friend
  • Share on LinkedIn
  • Share on X

Apply now

This job posting is collected from company pages and is only shown as short resume. Read entire job ad here:

view full ad at Teton.ai Aps



save
save deadline
print
mail me
Application deadline: as soon as possible
Equity payment
NewTech & AI
Geographic location

Applicant interest

How much interest does this ad generate among job seekers? Log in to see how popular this job posting is.



Please specify in your application, that you've found this ad in Akademikernes Jobbank

Apply
View job categories View more similar jobs Upgrade this job ad
Get latest jobs before others


Teton.ai Aps

Main office: Uplandsgade 56, 2, 2300 København S

Teton.ai is building the future of patient monitoring and workflow management at hospitals. Utilising advances in deep learning and computer vision we are working towards creating intelligent autonomous monitoring systems.


More info for this company

Talent demand All current jobs


https://jobbank.dk/en/job/2826413//
Karriereprofil i Jobbanken
Create career profile: Automate your job search with job agents, get latest career opportunities before others and get visible to employers with talent profile.