In today’s market, hiring isn’t always about permanent headcount. Many organisations need speed, flexibility and specialist expertise to deliver projects, manage peaks in demand or cover short-term gaps…
Senior AI Engineer – New York, Manhattan / Hybrid – Fast Growing Fintech – $175,000 – $250,000 + Competitive Equity
Richard Manso
Managing Director
Related Jobs
Data Scientist – New York / On-site – Series A Fintech Startup – $160,000 – $200,000 + Equity
Senior AI / ML Lead (Individual Contributor) – CPG (Consulting & Strategy Focus) – Hybrid / New Jersey or Chicago – $150,000-$220,000 + Bonus and Benefits
Overview
Our client is a fast growing financial technology company that builds AI agents that synthesize information across heterogeneous sources and deliver structured, reasoned answers in real time. The product only works if the agents are fast, reliable, and correct, not approximately correct.
Why This Role
The problems you will solve with our client do not yet have blog posts about them. Parallel agent DAG execution under hard latency budgets, streaming synthesis across partial sub-agent results, eval harnesses for non-deterministic multi-step systems, these are genuinely unsolved at production quality and every engineer's decisions ship to production.
The Role
Inference Optimization
- Drive TTFT below 400ms for multi-step agent pipelines
- Streaming optimization: first token to user while sub-agents are still running
- KV cache strategy, prompt compression, dynamic context window management
- Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models
Agent Architecture
- Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains
- Build reliable orchestration on top of Temporal: retries, timeouts, partial failure recovery, idempotency
- Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation
- Tool call design: schema design that LLMs actually follow reliably across providers
Evaluation & Harness
- Own the eval framework end to end: ground truth datasets, automated scoring pipelines, regression detection on every PR
- LLM-as-judge pipelines for qualitative output assessment
- Latency regression testing - p50/p95/p99 tracked across every deployment
- Adversarial test case design: ambiguous queries, missing data, conflicting sources, malformed tool responses
Infrastructure
- Model serving and cold start optimization
- Async worker architecture for parallel sub-agent execution
- Observability: trace every token, every tool call, every synthesis step
So what is our Client looking for?
- 3 - 7 years of experience in software engineering shipping AI applications in production.
- Experience integrating LLMs (Claude, GPT, Gemini) into production applications; depth in at least 1-2 of the following: inference optimization, agent architecture, or evals.
- Proficiency in Python for AI/ML development.
- Experience with LLM orchestration frameworks and production AI infrastructure.
- Someone who has built something that runs in production at a meaningful scale and understands why it's fast (or why it isn't).
The Ideal Profile
- You've worked on inference pipelines where TTFT was the primary metric and you moved it meaningfully
- You've built multi-step agent systems and you know where they break not from reading papers but from watching them fail in production
- You've written eval harnesses from scratch and you have opinions about what makes a ground truth dataset actually useful
- You've debugged LLM non-determinism in production and built systems resilient to it
- You've worked with streaming LLM responses and built infrastructure around partial output handling
The Potential Profile
- You've fine-tuned models but haven't shipped inference systems
- You've used LangChain/LlamaIndex but haven't built the layer underneath
- Strong ML research background without systems exposure
- Stack familiarity (we care more about depth than match): Go, Python, Temporal, Kafka, PostgreSQL, Docker
So in a nutshell we would love to talk to candidates who have shipped inference systems at:
- A real-time AI product (search, coding assistant, chat at scale)
- A model serving infrastructure company
- An agent platform (any domain)
- You've built eval/harness infrastructure that a team of 10+ engineers actually trusted to catch regressions.
How do you apply?
If you are interested in applying for the Senior AI Engineer role please do so via the link on this page or contact Digital Republic on the phone or email.
Get in contact with Digital Republic Talent by sending a mail to [email protected]. Check out the website on www.digitalrepublictalent.com. You can also find our more on Linkedin, Instagram or Facebook
News & Insights
-
Introducing the Gold Solution: Scalable, Cost-Efficient Hiring
Scaling a technology team is rarely straightforward. For many high-growth organisations, the challenge isn’t just access to talent. It’s balancing speed, cost and quality while navigating highly competitive…
-
Hiring Beyond Technical Skills: Why Behavioural Insight Matters More Than Ever
Hiring has become significantly more complex. Today’s roles require more than technical expertise alone; they demand strong soft skills, cultural alignment and the ability to collaborate effectively within…