Vibe Discord Bot with RAG Chat History
A Discord bot that stores long-term chat history using SQLite database with RAG (Retrieval-Augmented Generation) capabilities powered by custom embedding models.
Quick Start - Available Commands
Pre-built Bots
| Command | Description | Example Usage |
|---|---|---|
!doodlebob |
Generate images from text | !doodlebob a cat sitting on a moon |
!retcon |
Edit images with text prompts | !retcon <image attachment> Make it sunny |
Custom Bot Management
| Command | Description | Example Usage |
|---|---|---|
!custom <name> <personality> |
Create a custom bot with specific personality | !custom alfred you are a proper british butler |
!list-custom-bots |
List all available custom bots | !list-custom-bots |
!delete-custom-bot <name> |
Delete your custom bot | !delete-custom-bot alfred |
Using Custom Bots
Once you create a custom bot, you can interact with it directly by prefixing your message with the bot name:
!<bot_name> <your message>
Example:
- Create a bot:
!custom alfred you are a proper british butler - Use the bot:
alfred Could you fetch me some tea? - The bot will respond in character as a British butler
Features
-
Long-term chat history storage: Persistent storage of all bot interactions
-
RAG-based context retrieval: Smart retrieval of relevant conversation history using vector embeddings
-
Custom embedding model: Uses qwen3-embed-4b for semantic search capabilities
-
Efficient message management: Automatic cleanup of old messages based on configurable limits
-
Long-term chat history storage: Persistent storage of all bot interactions
-
RAG-based context retrieval: Smart retrieval of relevant conversation history using vector embeddings
-
Custom embedding model: Uses qwen3-embed-4b for semantic search capabilities
-
Efficient message management: Automatic cleanup of old messages based on configurable limits
Setup
Prerequisites
- Python 3.10 or higher
- uv package manager
- Embedding API key
- Discord bot token
Environment Variables
Create a .env file or export the following variables:
# Discord Bot Token
export DISCORD_TOKEN=your_discord_bot_token
# Embedding API Configuration
export OPENAI_API_KEY=your_embedding_api_key
export OPENAI_API_ENDPOINT=https://llama-embed.reeselink.com/embedding
# Image Generation (optional)
export IMAGE_GEN_ENDPOINT=http://toybox.reeselink.com:1234/v1
export IMAGE_EDIT_ENDPOINT=http://toybox.reeselink.com:1235/v1
# Database Configuration (optional)
export CHAT_DB_PATH=chat_history.db
export EMBEDDING_MODEL=qwen3-embed-4b
export EMBEDDING_DIMENSION=2048
export MAX_HISTORY_MESSAGES=1000
export SIMILARITY_THRESHOLD=0.7
export TOP_K_RESULTS=5
Installation
- Sync dependencies with uv:
uv sync
- Run the bot:
uv run main.py
How It Works
Database Structure
The system uses two SQLite tables:
-
chat_messages: Stores message metadata
- message_id, user_id, username, content, timestamp, channel_id, guild_id
-
message_embeddings: Stores vector embeddings for RAG
- message_id, embedding (as binary blob)
RAG Process
- When a message is received, it's stored in the database
- An embedding is generated using OpenAI's embedding API
- The embedding is stored alongside the message
- When a new message is sent to the bot:
- The system searches for similar messages using vector similarity
- Relevant context is retrieved and added to the prompt
- The LLM generates a response with awareness of past conversations
Configuration Options
- MAX_HISTORY_MESSAGES: Maximum number of messages to keep (default: 1000)
- SIMILARITY_THRESHOLD: Minimum similarity score for context retrieval (default: 0.7)
- TOP_K_RESULTS: Number of similar messages to retrieve (default: 5)
- EMBEDDING_MODEL: OpenAI embedding model to use (default: text-embedding-3-small)
Usage
The bot maintains conversation context automatically. When you ask a question, it will:
- Search for similar past conversations
- Include relevant context in the prompt
- Generate responses that are aware of the conversation history
File Structure
vibe_discord_bots/
├── main.py # Main bot application
├── database.py # SQLite database with RAG support
├── pyproject.toml # Project dependencies (uv)
├── .env # Environment variables
├── .venv/ # Virtual environment (created by uv)
└── README.md # This file
Build
Using uv
# Set environment variables
export DISCORD_TOKEN=$(cat .token)
export OPENAI_API_KEY=your_api_key
export OPENAI_API_ENDPOINT="https://llama-cpp.reeselink.com"
export IMAGE_GEN_ENDPOINT="http://toybox.reeselink.com:1234/v1"
export IMAGE_EDIT_ENDPOINT="http://toybox.reeselink.com:1235/v1"
# Run with uv
uv run main.py
Container
# Build
podman build -t vibe-bot:latest .
# Run
podman run --env-file .env localhost/vibe-bot:latest
Docs
Open AI
Chat
https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create
Images
https://developers.openai.com/api/reference/python/resources/images/methods/edit
Models
Qwen3.5
We recommend using the following set of sampling parameters for generation
- Non-thinking mode for text tasks: temperature=1.0, top_p=1.00, top_k=20, min_p=0.0, presence_penalty=2.0, repetition_penalty=1.0
- Non-thinking mode for VL tasks: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
- Thinking mode for text tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
- Thinking mode for VL or precise coding (e.g. WebDev) tasks : temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Please note that the support for sampling parameters varies according to inference frameworks.