The way developers are building GenAI apps had changed dramatically. Gone are the days when a clever prompt and a single API call to an LLM sufficed. Instead, full-stack generative AI systems began to take shape – mirroring conventional app development with frameworks, layers, observability, and best practices.
At the centre of this shift were tools like LangChain, LlamaIndex (formerly GPT Index), Dust, CrewAI, and Flowise. These frameworks enabled developers to string together complex chains of logic, integrate vector databases, fetch external APIs, and manage user context across sessions.
LangChain stood out as the industry’s default scaffolding tool. With its abstraction layers for prompt templates, agents, tools, and memory, LangChain allowed engineers to build apps that:
- Performed structured data retrieval (via RAG)
- Accessed and combined information from real-time APIs
- Took actions (e.g., sending emails, summarising documents, or updating CRMs)
LlamaIndex, on the other hand, specialised in document indexing and retrieval. It offered powerful support for building semantic search layers across PDFs, Notion pages, internal wikis, and customer support data. Enterprises began adopting LlamaIndex to feed their LLMs highly curated, context-rich data without retraining models.
Dust and CrewAI pushed forward the idea of multi-agent collaboration. These systems let developers deploy collections of AI agents—each with specialised roles—who could reason together, plan complex workflows, and escalate queries to humans when necessary. For instance, one Dust implementation included a planning agent, a finance agent, and a policy-checking agent working together to generate board reports from multiple data streams.
On the frontend, visual LLM builders like Flowise and OpenDevin introduced no-code and low-code workflows for building GenAI apps. Marketing teams, analysts, and even product managers could now create AI-powered experiences without writing code—automating newsletter generation, customer onboarding scripts, and competitive analysis reports.
This evolution led to the formalisation of the GenAI stack, typically consisting of:
- LLM orchestration layer (LangChain, Dust)
- Data indexing + retrieval (LlamaIndex, Weaviate, Pinecone)
- Embedding + chunking strategy (OpenAI, Cohere, Hugging Face)
- Prompt templates + memory
- Tool integrations + APIs (Zapier, REST, GraphQL)
- Observability + logging (Traceloop, PromptLayer)
Cloud vendors took notice. AWS, GCP, and Azure launched managed services for vector databases, LLM caching layers, and function calling orchestration. Google’s Vertex AI and Microsoft’s Azure AI Studio gained traction among teams wanting managed pipelines with built-in governance.
The result? GenAI apps now handled:
- RAG-based search interfaces across millions of enterprise docs
- Sales intelligence assistants with real-time LinkedIn and CRM scraping
- Compliance tools that scanned internal policies and responded in plain language
- Support bots that fetched and summarised tickets, policy docs, and user activity
But complexity brought new concerns:
- Latency: Chaining multiple calls to embeddings, vector search, and LLMs introduced user-perceivable lag.
- Cost: Poorly optimised RAG strategies could cost thousands per month in API usage.
- Security: API chaining meant more endpoints, more secrets, and higher data exposure risk.
- Prompt drift: As prompt templates evolved, app behaviour sometimes changed subtly or unpredictably.
To manage this, companies began adopting best practices borrowed from traditional software engineering:
- Version control for prompts and chains
- CI/CD pipelines for prompt testing
- Logging and audit trails for LLM usage
- Synthetic test data for regression testing RAG queries
By August, LLMOps (Large Language Model Operations) had become a recognised sub-discipline. Startups like Reworkd, Traceloop, and Unstructured.io emerged to offer observability, prompt evaluation, and content parsing infrastructure.
Developers now viewed GenAI apps not as magic scripts, but as production systems. Monitoring, QA, load testing, and observability became baseline expectations. A bug in a RAG pipeline was no longer a novelty—it was a P0 incident.
The bottom line: August 2024 marked the arrival of GenAI as real software. The community moved from prompts to pipelines, from curiosity to capability. And with it came an industry-wide realisation: Generative AI isn’t just a model—it’s a stack.
No responses yet