I apply AI as an engineer โ integration, evaluation, cost/latency tradeoffs, and shipping systems that organisations actually depend on. Not demos.
The hardest part of production AI is not picking a model โ it's knowing when the answer is wrong. My approach starts with evaluation before architecture: define the failure modes first, then design the system to catch them.
I hold IBM's RAG & Agentic AI and Generative AI specialisations, and I've applied these methods on real industrial data at WEISS GmbH โ not on benchmark datasets.
For most retrieval problems, RAG over a well-chunked index outperforms fine-tuning and is far cheaper to update when data changes. Fine-tune only when latency or domain vocabulary make it necessary.
Define the failure modes, ground-truth examples, and acceptance criteria before writing pipeline code. Retrofitting evals is three times the work.
A system that costs โฌ4,000/month in inference or takes 8 seconds per query is not production-ready. I model total cost of ownership from day one.
Every LLM integration I ship has output validation, fallback paths, and logging. Hallucinations in industrial contexts are not an acceptable UX failure.
A deterministic rule or a SQL query is better than an LLM call for structured lookups. I recommend AI where it genuinely outperforms the simpler option โ not as a default.
Embedding, retrieval, prompt, inference, output parsing, API exposure โ I build and own the complete pipeline, not just the "AI part".
Each project used the Problem โ Role โ Approach โ Stack โ Outcome structure.