March 29, 2026
Why comparing local models to frontier APIs misses the point — and how system constraints, KV-cache memory, and work like TurboQuant change what “good enough” means on your own hardware.
Writing, notes, and technical references I keep coming back to.
March 29, 2026
Why comparing local models to frontier APIs misses the point — and how system constraints, KV-cache memory, and work like TurboQuant change what “good enough” means on your own hardware.
March 24, 2026
Building a banking GenAI assistant: why tool calling stopped being a classification problem and became a question of evaluating controlled decision-making
March 22, 2026
On-device inference is shifting from a niche constraint to a genuine deployment choice — and the implications for privacy, latency, cost, and product design are structural, not incremental.
March 16, 2026
What happens when the model isn't just generating code inside a workflow, but actually running the experiment loop itself — while you define the rules it has to live within.
March 13, 2026
Unclear boundaries of performance can create false sense of success with LLMs
February 15, 2026
The landscape of AI is shifting fast. Here's what I'm seeing and why it matters.