the firehose of all talks and workshops from the AI Engineer World's Fair 2025! for more info see https://ai.engineer/llms
Rishabh Garg, Tesla Optimus — Challenges in High Performance Robotics Systems
Building an Agentic Platform — Ben Kus, CTO Box
Five hard earned lessons about Evals — Ankur Goyal, Braintrust
Perceptual Evaluations: Evals for Aesthetics — Diego Rodriguez, Krea.ai
How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock
Form factors for your new AI coworkers — Craig Wattrus, Flatfile
Fuzzing in the GenAI Era — Leonard Tang, Haize Labs
Multi Agent AI and Network Knowledge Graphs for Change — Ola Mabadeje, Cisco
Wisdom-Driven Knowledge Augmented Generation at Scale - Chin Keong Lam, Patho AI
The Next Unicorns: 7 Top AI startups from the HF0 Residency
#define AI Engineer - Greg Brockman, OpenAI (ft. Jensen Huang)
The Future of Evals - Ankur Goyal, Braintrust
Designing AI-Intensive Applications - swyx
How to look at your data — Jeff Huber (Choma) + Jason Liu (567)
On Engineering AI Systems that Endure The Bitter Lesson - Omar Khattab, DSPy & Databricks
Evals Are Not Unit Tests — Ido Pesok, Vercel v0
2025 is the Year of Evals! Just like 2024, and 2023, and … — John Dickerson, CEO Mozilla AI
Vibe Coding with Confidence — Itamar Friedman, Qodo
AI Automation that actually works: $100M, messy data, zero surprises - Tanmai Gopal, Hasura/PromptQL
Full Workshop: Realtime Voice AI — Mark Backman, Daily
Vision AI in 2025 — Peter Robicheaux, Roboflow
Practical tactics to build reliable AI apps — Dmitry Kuchin, Multinear
How to Improve your Vibe Coding — Ian Butler
Vibes won't cut it — Chris Kelly, Augment Code
Real World Development with GitHub Copilot and VS Code — Harald Kirschner, Christopher Harrison
Building Agents at Cloud Scale — Antje Barth, AWS
State of Startups and AI 2025 - Sarah Guo, Conviction
Useful General Intelligence — Danielle Perszyk, Amazon AGI
The 2025 AI Engineering Report — Barr Yaron, Amplify
Agents vs Workflows: Why Not Both? — Sam Bhagwat, Mastra.ai
Why We Don’t Need More Data Centers - Dr. Jasper Zhang, Hyperbolic
Infrastructure for the Singularity — Jesse Han, Morph
Hacking the Inference Pareto Frontier - Kyle Kranen, NVIDIA
Pipecat Cloud: Enterprise Voice Agents Built On Open Source - Kwindla Hultman Kramer, Daily
[Full Workshop] Building Conversational AI Agents - Thor Schaeff, ElevenLabs
From Self-driving to Autonomous Voice Agents — Brooke Hopkins, Coval
Why ChatGPT Keeps Interrupting You — Dr. Tom Shapland, LiveKit
Your realtime AI is ngmi — Sean DuBois (OpenAI), Kwindla Kramer (Daily)
Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber
How to defend your sites from AI bots — David Mytton, Arcjet
The Unofficial Guide to Apple’s Private Cloud Compute - Jmo, CONFSEC
How to Secure Agents using OAuth — Jared Hanson (Keycard, Passport.js)
How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco
OpenAI on Securing Code-Executing AI Agents — Fouad Matin (Codex, Agent Robustness)
Evaluating AI Search: A Practical Framework for Augmented AI Systems — Quotient AI + Tavily
Scaling Enterprise-Grade RAG: Lessons from Legal Frontier - Calvin Qi (Harvey), Chang She (Lance)
Building Alice’s Brain: an AI Sales Rep that Learns Like a Human - Sherwood & Satwik, 11x
Layering every technique in RAG, one query at a time - David Karam, Pi Labs (fmr. Google Search)
Building a Smarter AI Agent with Neural RAG - Will Bryk, Exa.ai
[Full Workshop] Building Metrics that actually work — David Karam, Pi Labs (fmr Google Search)
Make your LLM app a Domain Expert: How to Build an Expert System — Christopher Lovejoy, Anterior
Shipping Products When You Don't Know What they Can Do — Ben Stein, Teammates
Shipping something to someone always wins — Kenneth Auchenberg (ex. Stripe, VSCode)
Why your product needs an AI product manager, and why it should be you — James Lowe, i.AI
Everything is ugly, so go build something that isn't — Raiza Martin, Huxe (ex NotebookLM)
Building the platform for agent coordination — Tom Moor, Linear
What Is a Humanoid Foundation Model? An Introduction to GR00T N1 - Annika & Aastha
Real-time Experiments with an AI Co-Scientist - Stefania Druga, fmr. Google Deepmind
Scaling AI Agents Without Breaking Reliability — Preeti Somal, Temporal
Government Agents: AI Agents vs Tough Regulations — Mark Myshatyn, Los Alamos National Laboratory
Ship Agents that Ship: A Hands-On Workshop - Kyle Penfound, Jeremy Adams, Dagger
The AI Engineer’s Guide to Raising VC — Dani Grant (Jam), Chelcie Taylor (Notable)
Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith
Why you should care about AI interpretability - Mark Bissell, Goodfire AI
Information Retrieval from the Ground Up - Philipp Krenn, Elastic
Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten
Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence
Waymo's EMMA: Teaching Cars to Think - Jyh Jing Hwang, Waymo
A2A & MCP Workshop: Automating Business Processes with LLMs — Damien Murphy, Bench
Ship Production Software in Minutes, Not Months — Eno Reyes, Factory
Beyond the Prototype: Using AI to Write High-Quality Code - Josh Albrecht, Imbue
Software Development Agents: What Works and What Doesn't - Robert Brennan, AllHands/OpenHands
Devin 2.0 and the Future of SWE - Scott Wu, Cognition
Your Coding Agent Just Got Cloned And Your Brain Isn't Ready - Rustin Banks, Google Jules
Latent Space Paper Club: AIEWF Special Edition (Test of Time, DeepSeek R1/V3) — VIbhu Sapra
Human seeded Evals — Samuel Colvin, Pydantic
Building AI Products That Actually Work — Ben Hylak (Raindrop), Sid Bendre (Oleve)
AI That Pays: Lessons from Revenue Cycle — Nathan Wan, Ensemble Health
Structuring a modern AI team — Denys Linkov, Wisedocs
The Rise of Open Models in the Enterprise — Amir Haghighat, Baseten
Mentoring the Machine — Eric Hou, Augment Code
Building Applications with AI Agents — Michael Albada, Microsoft
AX is the only Experience that Matters - Ivan Burazin, Daytona
How to build Enterprise Aware Agents - Chau Tran, Glean
Monetizing AI — Alvaro Morales, Orb
Does AI Actually Boost Developer Productivity? (100k Devs Study) - Yegor Denisov-Blanch, Stanford
How agents will unlock the $500B promise of AI - Donald Hruska, Retool
How Intuit uses LLMs to explain taxes to millions of taxpayers - Jaspreet Singh, Intuit
3 ingredients for building reliable enterprise agents - Harrison Chase, LangChain/LangGraph
From Hype to Habit: How We’re Building an AI-First SaaS Company—While Still Shipping the Roadmap
Machines of Buying and Selling Grace - Adam Behrens, New Generation
How to Build Planning Agents without losing control - Yogendra Miraje, Factset
Building Agents (the hard parts!) - Rita Kozlov, Cloudflare
POC to PROD: Hard Lessons from 200+ Enterprise GenAI Deployments - Randall Hunt, Caylent
Build Dynamic Products, and Stop the AI Sideshow — Eliza Cabrera (Workday) + Jeremy Silva (Freeplay)
The Billable Hour is Dead; Long Live the Billable Hour — Kevin Madura + Mo Bhasin, Alix Partners
From Copilot to Colleague: Trustworthy Agents for High-Stakes - Joel Hron, CTO Thomson Reuters
How to Hire AI Engineers when EVERYONE is cheating with AI — Beth Glenfield, DevDay
Stateful environments for vertical agents — Josh Purtell, Synth Labs
Books reimagined: AI to create new experiences for things you know — Lukasz Gandecki, TheBrain.pro
AI powered entomology: Lessons from millions of AI code reviews — Tomas Reimers, Graphite
Critical AI Inference your CIO can Trust — Sahil Yadav, Hariharan Ganesan, Telemetrak
How to run Evals at Scale: Thinking beyond Accuracy or Similarity — Muktesh Mishra, Adobe
Continuous Profiling for GPUs — Matthias Loibl, Polar Signals
Top Ten Challenges to Reach AGI — Stephen Chin, Andreas Kollegger
Practical GraphRAG: Making LLMs smarter with Knowledge Graphs — Michael, Jesus, and Stephen, Neo4j
Knowledge Graphs in Litigation Agents — Tom Smoker, WhyHow
When Vectors Break Down: Graph-Based RAG for Dense Enterprise Knowledge - Sam Julien, Writer
Stop Using RAG as Memory — Daniel Chalef, Zep
HybridRAG: A Fusion of Graph and Vector Retrieval - Mitesh Patel, NVIDIA
tldraw.computer - Steve Ruiz, tldraw
Excalidraw: AI and Human Whiteboarding Partnership - Christopher Chedeau
The Bitter Layout or: How I Learned to Love the Model Picker — Maximillian Piras, Yutori
CIAM for AI: Authn/Authz for Agents — Michael Grinich, CEO of WorkOS
Good design hasn’t changed with AI — John Pham, SF Compute
Building Effective Voice Agents — Toki Sherbakov + Anoop Kotha, OpenAI
Robots as professional Chefs - Nikhil Abraham, CloudChef
[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han
Google Photos Magic Editor: GenAI Under the Hood of a Billion-User App - Kelvin Ma, Google Photos
Dream Machine: Scaling to 1m users in 4 days — Keegan McCallum, Luma AI
ComfyUI Full Workshop — first workshop from ComfyAnonymous himself!
Design like Karpathy is watching — Zeke Sikelianos, Replicate
On Curiosity — Sharif Shameem, Lexica
Real world MCPs in GitHub Copilot Agent Mode — Jon Peck, Microsoft
The rise of the agentic economy on the shoulders of MCP — Jan Curn, Apify
Full Spec MCP: Hidden Capabilities of the MCP spec — Harald Kirschner, Microsoft/VSCode
Shipping an Enterprise Voice AI Agent in 100 Days - Peter Bar, Intercom Fin
The State of Generative Media - Gorkem Yurtseven, FAL
Teaching Gemini to Speak YouTube: Adapting LLMs for Video Recommendations to 2B+DAU - Devansh Tandon
Transforming search and discovery using LLMs — Tejaswi & Vinesh, Instacart
Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix
360Brew: LLM-based Personalized Ranking and Recommendation - Hamed and Maziar, LinkedIn AI
What We Learned from Using LLMs in Pinterest — Mukuntha Narayanan, Han Wang, Pinterest
Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3 — Greg Kamradt, ARC Prize Foundation
RL for Autonomous Coding — Aakanksha Chowdhery, Reflection.ai
Recsys Keynote: Improving Recommendation Systems & Search in the Age of LLMs - Eugene Yan, Amazon
Benchmarks Are Memes: How What We Measure Shapes AI—and Us - Alex Duffy, Every.to
Small AI Teams with Huge Impact — Vik Paruchuri, Datalab
Rethinking Team Building: how a 30-person Startup serves 50 Million Users — Grant Lee, Gamma
Building a 10 person unicorn - Max Brodeur-Urbas, Gumloop
Using OSS models to build AI apps with millions of users — Hassan El Mghari
Bolt.new: How we scaled $0-20m ARR in 60 days, with 15 people — Eric Simons, Bolt
Prompt Engineering and AI Red Teaming — Sander Schulhoff, HackAPrompt/LearnPrompting
Survive the AI Knife Fight: Building Products That Win — Brian Balfour, Reforge
Automating Escrow with USDC and AI - Corey Cooper, Circle
How LLMs work for Web Devs: GPT in 600 lines of Vanilla JS - Ishan Anand
[Workshop] AI Pipelines and Agents in Pure TypeScript with Mastra.ai — Nick Nisi, Zack Proser
AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind
The New Code — Sean Grove, OpenAI
Production software keeps breaking and it will only get worse — Anish Agarwal, Traversal.ai
Thinking Deeper in Gemini — Jack Rae, Google DeepMind
A year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind
2025 in LLMs so far, illustrated by Pelicans on Bicycles — Simon Willison
Trends Across the AI Frontier — George Cameron, ArtificialAnalysis.ai
Training Agentic Reasoners — Will Brown, Prime Intellect
New York Times' Connections: A Case Study on NLP in Word Games — Shafik Quoraishee, NYT Games
Claude Code & the evolution of agentic coding — Boris Cherny, Anthropic
12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer
MCP Is Not Good Yet — David Cramer, Sentry
Your Personal Open-Source Humanoid Robot for $8,999 — JX Mo, K-Scale Labs
The Build-Operate Divide: Bridging Product Vision and AI Operational Reality
The New Lean Startup — Sid Bendre, Oleve
Optimizing inference for voice models in production - Philip Kiely, Baseten
Conquering Agent Chaos — Rick Blalock, Agentuity
[Evals Workshop] Mastering AI Evaluation: From Playground to Production
Intro to GraphRAG — Zach Blumenfeld
Securing Agents with Open Standards — Bobby Tiernay and Kam Sween, Auth0
The emerging skillset of wielding coding agents — Beyang Liu, Sourcegraph / Amp
Turning Fails into Features: Zapier’s Hard-Won Eval Lessons — Rafal Willinski, Vitor Balocco, Zapier
Building voice agents with OpenAI — Dominik Kundel, OpenAI
Containing Agent Chaos — Solomon Hykes, Dagger
Evals 101 — Doug Guthrie, Braintrust
Why should anyone care about Evals? — Manu Goyal, Braintrust
Engineering Better Evals: Scalable LLM Evaluation Pipelines That Work — Dat Ngo, Aman Khan, Arize
To the moon! Navigating deep context in legacy code with Augment Agent — Forrest Brazeal, Matt Ball
Serving Voice AI at Scale — Arjun Desai (Cartesia) & Rohit Talluri (AWS)
Ship it! Building Production Ready Agents — Mike Chambers, AWS
Introducing Strands Agents, an Open Source AI Agents SDK — Suman Debnath, AWS
Data is Your Differentiator: Building Secure and Tailored AI Systems — Mani Khanuja, AWS
How to build world-class AI products — Sarah Sachs (AI lead @ Notion) & Carlos Esteban (Braintrust)
From Mixture of Experts to Mixture of Agents with Super Fast Inference - Daniel Kim & Daria Soboleva
Forget RAG Pipelines—Build Production Ready Agents in 15 Mins: Nina Lopatina, Rajiv Shah, Contextual
Milliseconds to Magic: Real‑Time Workflows using the Gemini Live API and Pipecat
Realtime Conversational Video with Pipecat and Tavus — Chad Bailey and Brian Johnson, Daily & Tavus
Vector Search Benchmark[eting] - Philipp Krenn, Elastic
Taming Rogue AI Agents with Observability-Driven Evaluation — Jim Bennett, Galileo
Building agent fleet architectures your CISO doesn't hate — Lou Bichard, Gitpod
Don’t get one-shotted: Use AI to test, review, merge, and deploy code — Tomas Reimers, Graphite
Effective agent design patterns in production — Laurie Voss, LlamaIndex
Foundry Local: Cutting-Edge AI experiences on device with ONNX Runtime/Olive — Emma Ning, Microsoft
[Full Workshop] Vibe Coding at Scale: Customizing AI Assistants for Enterprise Environments
Unlocking AI Powered DevOps Within Your Organization — Jon Peck, GitHub
Vibe Coding at Scale: Customizing AI Assistants for Enterprise Environments - Harald Kirshner,
The Agent Awakens: Collaborative Development with Copilot - Christopher Harrison, GitHub
AI Red Teaming Agent: Azure AI Foundry — Nagkumar Arkalgud & Keiji Kanazawa, Microsoft
Collaborating with Agents in your Software Dev Workflow - Jon Peck & Christopher Harrison, Microsoft
Agentic Excellence: Mastering AI Agent Evals w/ Azure AI Evaluation SDK — Cedric Vidal, Microsoft
Building Code First AI Agents with Azure AI Agent Service — Cedric Vidal, Microsoft
How fast are LLM inference engines anyway? — Charles Frye, Modal
RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (acq. Voyage AI)
The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)
Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB
Building Multimodal AI Agents From Scratch — Apoorva Joshi, MongoDB
Why Your Agent’s Brain Needs a Playbook: Practical Wins from Using Ontologies - Jesús Barrasa, Neo4j
Memory Masterclass: Make Your AI Agents Remember What They Do! — Mark Bain, AIUS
Graph Intelligence: Enhance Reasoning and Retrieval Using Graph Analytics - Alison & Andreas, Neo4j
GraphRAG methods to create optimized LLM context windows for Retrieval — Jonathan Larson, Microsoft
Agentic GraphRAG: Simplifying Retrieval Across Structured & Unstructured Data — Zach Blumenfeld
Revenue Engineering: How to Price (and Reprice) Your AI Product — Kshitij Grover, Orb
"Data readiness" is a Myth: Reliable AI with an Agentic Semantic Layer — Anushrut Gupta, PromptQL
Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza
Events are the Wrong Abstraction for Your AI Agents - Mason Egger, Temporal.io
Prompt Engineering is Dead — Nir Gazit, Traceloop
The Eyes Are The (Context) Window to The Soul: How Windsurf Gets to Know You — Sam Fertig, Windsurf
Mastering Engineering Flow with Windsurf - Eashan Sinha, Windsurf
(possible dupe but better sound) What does Enterprise Ready MCP mean? — Tobin South, WorkOS
CI in the Era of AI: From Unit Tests to Stochastic Evals — Nathan Sobo, Zed
Private video
Fun stories from building OpenRouter and where all this is going - Alex Atallah, OpenRouter
Building AI Agents that actually automate Knowledge Work - Jerry Liu, LlamaIndex
RFT, DPO, SFT: Fine-tuning with OpenAI — Ilan Bigio, OpenAI
Windsurf everywhere, doing everything, all at once - Kevin Hou, Windsurf
Case Study + Deep Dive: Telemedicine Support Agents with LangGraph/MCP - Dan Mason
Veo 3 for Developers — Paige Bailey, Google DeepMind
Building Agents with Amazon Nova Act and MCP - Du'An Lightfoot, Amazon (Full Workshop)
Large Scale AI on Apple Silicon — Alex Cheema, EXO Labs
The Web Browser Is All You Need - Paul Klein IV, Browserbase
The State of MCP observability: Observable.tools — Alex Volkov and Benjamin Eckel, W&B and Dylibso
Building Protected MCP Servers — Den Delimarsky and Julia Kasper, MCP Steering Committee & Microsoft
The Geopolitics of AI Infrastructure - Dylan Patel, SemiAnalysis
Remote MCPs: What we learned from shipping — John Welsh, Anthropic
AI Engineer World's Fair 2025 Hackathon Presentations
MCP: Origins and Requests For Startups — Theodora Chu, Model Context Protocol PM, Anthropic
Spark to System: Building the Open Agentic Web - Asha Sharma, Microsoft