Skip to main content

๐Ÿ” Building a RAG Application with AWS Knowledgebase

In our previous post, we walked through the essential building blocks of a Retrieval-Augmented Generation (RAG) application:

  1. Data preparation
  2. Chunking strategy
  3. Embedding generation
  4. Vector database storage
  5. Retrieval of relevant context
  6. Augmentation of LLM prompts
  7. Response generation

What if I told you that all of these steps can now be abstracted into a fully managed serviceโ€”and spun up in minutes?

๐Ÿš€ Enter AWS Knowledge Bases.

โš™๏ธ What is AWS Knowledgebase?โ€‹

AWS Knowledgebase in Amazon Bedrock is a fully managed RAG orchestration service that lets you:

  • Ingest and chunk your data
  • Generate embeddings
  • Store embeddings in a vector store
  • Retrieve relevant content
  • Feed it into an LLM (like Claude 4 or Cohere)
  • Get a grounded response โ€” via a single API

โœ… Key Benefitsโ€‹

  • Serverless setup
  • Managed embeddings (pick from built-in options)
  • Support for major vector DBs: OpenSearch Serverless, Weaviate, pgVector
  • Built-in sync and refresh support
  • Instant testing UI

In short: it abstracts away the entire RAG backend so you can focus on building the front-end experience and domain logic.

๐Ÿ› ๏ธ Services Requiredโ€‹

To get started, youโ€™ll need:

AWS ServicePurpose
S3To store your source documents
AWS BedrockTo access foundation models like Claude
AWS KnowledgebaseThe main service to manage RAG workflow
OpenSearch ServerlessVector storage for embeddings
IAMPermissions and policies for access

Make sure your IAM user has permissions for all of the above.

๐Ÿ’ก Setting Up Your Knowledgebaseโ€‹

You can create a Knowledgebase via:

During setup, youโ€™ll:

  1. Upload PDFs or documents to S3
  2. Define chunking and embedding strategies
  3. Choose your vector DB (Used OpenSearch Serverless)
  4. Pick your LLM from Bedrock (e.g., Claude 3.5 Sonnet v2)
  5. Deploy and test right from the console

๐Ÿ“Œ Tip: Keeping your vector index synced is one of the most overlookedโ€”but criticalโ€”steps in building production-scale RAG systems. AWS Knowledge Bases makes this a one-click action.

๐Ÿงช Quick Test Before Integrationโ€‹

Once your Knowledge Base is live, AWS lets you test it from the console to verify:

  • Documents were chunked correctly
  • Embeddings are being retrieved
  • The LLM is producing grounded answers

Before writing a single line of code, you can simulate a real RAG interaction.

๐Ÿ’ฌ Building a RAG Chatbot with Streamlitโ€‹

To demonstrate this, I built a quick chatbot UI using Python and Streamlit. It connects to the retrieve_and_generate() API from the Bedrock Agent Runtime and leverages our Knowledge Base to answer questions from a PDF on AI in the enterprise.

๐Ÿ”ง Technical Implementation Details

The chatbot uses the following architecture:

  • Frontend: Streamlit for the UI
  • Backend: AWS Bedrock Agent Runtime API
  • Model: Claude 3.5 Sonnet v2
  • Vector Store: OpenSearch Serverless
  • Document Source: S3 bucket with enterprise AI PDFs

RAG Architecture

Fig 1: RAG Architecture with AWS Knowledge Base

The codebase is organized into three key components:

  • System Prompt: Provides instructions to guide the botโ€™s behavior
  • API Integration: Connects to AWS Bedrock to retrieve the relevant document chunks
  • Response Generation: Uses the retrieved context to craft accurate, grounded answers

๐Ÿง  Code Snippetsโ€‹

๐Ÿ”ธ System Prompt Templateโ€‹

        
# Default knowledge base prompt
default_prompt = """
Act as a question-answering agent for the AI Social Journal Q&A Bot to help users with their questions.
Your role is to:
- Provide accurate information based on the knowledge base
- Be helpful, friendly, and professional
- If information is not available, suggest alternative resources or clarify the question

Guidelines:
1. Answering Questions:
- Answer the user's question strictly based on search results
- Correct any grammatical or typographical errors in the user's question
- If search results don't contain relevant information, state: "I could not find an exact answer to your question. Could you please provide more information or rephrase your question?"

2. Validating Information:
- Double-check search results to validate any assertions

3. Clarification:
- Ask follow-up questions to clarify if the user doesn't provide enough information

Here are the search results in numbered order:
$search_results$

$output_format_instructions$
"""

๐Ÿ”ธ AWS Bedrock API Callโ€‹

 response = self.bedrock_agent_runtime_client.retrieve_and_generate(
input={"text": query},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": os.getenv("KB_ID"),
"modelArn": os.getenv("FM_ARN"),
"retrievalConfiguration": {
"vectorSearchConfiguration": {}
},
"generationConfiguration": {
"promptTemplate": {"textPromptTemplate": default_prompt},
"inferenceConfig": {
"textInferenceConfig": {
"maxTokens": 2000,
"temperature": 0.7,
"topP": 0.9,
}
},
},
},
},
)

๐ŸŽฏ Chatbot Application Demoโ€‹

To simulate the chatbot experience, I created a quick Streamlit application that showcases real RAG interactions:

Chatbot UI Fig 2: AI Social Journal Chatbot App

Example Document:โ€‹

I used the OpenAI โ€œAI in the Enterpriseโ€ document, which discusses the adoption of AI in large organizations, including its benefits and challenges.

๐Ÿ“„ openai-ai-in-the-enterprise.pdf AI in Enterprise

Sample Promptโ€‹

โ€œWhat are the key considerations for adopting AI in large organizations?โ€

๐Ÿง  Claude 3.5 retrieved the correct chunk from the Knowledge Base and provided a concise, contextualized answer โ€” just like a smart analyst would.

๐Ÿ“ธ Sample Outputโ€‹

Chatbot UI with Answer Fig 3: RAG Response Generation - Contextual AI Answer

Chatbot UI with Conversation History Fig 4: Conversation Flow - Multi-turn RAG Interactions

๐Ÿ’ฐ Cost Considerations & Best Practicesโ€‹

AWS Pricing Alert

AWS Knowledge Bases and its supporting services are not part of the free tier. Key cost factors:

  • OpenSearch Serverless: Vector storage and search operations
  • Amazon Bedrock: Model inference costs (pay-per-token)
  • S3: Document storage (minimal cost)
  • Data ingestion: Processing and embedding generation

๐Ÿ›ก๏ธ Cost Optimization Tips:โ€‹

  • Clean up resources when not in use
  • Use CloudFormation templates or Infrastructure as Code (IaC) for easy provisioning/de-provisioning
  • Monitor usage with CloudWatch and set up billing alerts
  • Consider development vs production resource sizing
Pro Tip

I created everything manually for this demo, but I strongly recommend automating deployment as a CloudFormation stack for easier cleanup and repeatability.

๐Ÿงญ Final Thoughtsโ€‹

AWS Knowledge Bases simplifies the RAG development workflow dramatically:

  • No need to manage chunking, embeddings, or retrieval logic
  • Direct integration with Bedrock and serverless vector stores
  • API or console-based orchestration

If you're building AI copilots, knowledge agents, or internal AI search tools โ€” this is a game-changer for cutting down time-to-market and focusing on your appโ€™s UX and logic.

๐Ÿ”— Resourcesโ€‹