To Scale our RAG Agent (5,000 Files per/hr)



Date: 09/22/2025

Watch the Video

Okay, this video is gold for any developer like me who’s diving headfirst into the world of AI-powered workflows. It’s all about scaling RAG (Retrieval Augmented Generation) systems built with n8n, a no-code automation platform. The creator shares their experience of boosting processing speed from 100 files/hour to a whopping 5,000 files/hour. They didn’t just wave a magic wand; they went through the trenches, broke things, and learned a ton about optimizing n8n, Supabase, and even dealing with Google Drive limitations at scale. Sounds familiar, right?

What makes this video a must-watch is its pragmatic approach. It’s not just theoretical fluff; it’s a deep dive into real-world challenges like bottlenecks, server crashes, and painfully slow data imports. The video provides a systematic approach for benchmarking, tuning, and scaling complex n8n workflows. They cover everything from setting up n8n workers and Redis queuing for parallel processing to building a robust orchestrator with retry logic. Plus, there’s a valuable lesson about knowing when to bypass APIs and go directly to the database. (Hello, Supabase!).

For me, the most inspiring part is the tangible impact this kind of optimization can have. Imagine automating document processing, content analysis, or even code generation at this scale. By understanding these scaling techniques, we can build more robust and efficient AI-driven solutions for clients. I can see this being super useful for automating the ingestion and processing of our documentation for the AI code generation tools we are building. It would be a time saver and a great learning experience to implement. I’m definitely eager to experiment with the concepts in the video and see how they can transform my own AI workflow integrations.