# Vibe Discord Bot with RAG Chat History A Discord bot that stores long-term chat history using SQLite database with RAG (Retrieval-Augmented Generation) capabilities powered by custom embedding models. - [Vibe Discord Bot with RAG Chat History](#vibe-discord-bot-with-rag-chat-history) - [Quick Start - Available Commands](#quick-start---available-commands) - [Pre-built Bots](#pre-built-bots) - [Custom Bot Management](#custom-bot-management) - [Using Custom Bots](#using-custom-bots) - [Features](#features) - [Setup](#setup) - [Prerequisites](#prerequisites) - [Environment Variables](#environment-variables) - [Installation](#installation) - [How It Works](#how-it-works) - [Database Structure](#database-structure) - [RAG Process](#rag-process) - [Configuration Options](#configuration-options) - [Usage](#usage) - [File Structure](#file-structure) - [Build](#build) - [Using uv](#using-uv) - [Container](#container) - [Docs](#docs) - [Open AI](#open-ai) - [Models](#models) - [Qwen3.5](#qwen35) ## Quick Start - Available Commands ### Pre-built Bots | Command | Description | Example Usage | | ------------ | ----------------------------- | ------------------------------------------ | | `!doodlebob` | Generate images from text | `!doodlebob a cat sitting on a moon` | | `!retcon` | Edit images with text prompts | `!retcon Make it sunny` | ### Custom Bot Management | Command | Description | Example Usage | | ------------------------------ | --------------------------------------------- | ------------------------------------------------ | | `!custom ` | Create a custom bot with specific personality | `!custom alfred you are a proper british butler` | | `!list-custom-bots` | List all available custom bots | `!list-custom-bots` | | `!delete-custom-bot ` | Delete your custom bot | `!delete-custom-bot alfred` | ### Using Custom Bots Once you create a custom bot, you can interact with it directly by prefixing your message with the bot name: ```bash ! ``` **Example:** 1. Create a bot: `!custom alfred you are a proper british butler` 2. Use the bot: `alfred Could you fetch me some tea?` 3. The bot will respond in character as a British butler ## Features - **Long-term chat history storage**: Persistent storage of all bot interactions - **RAG-based context retrieval**: Smart retrieval of relevant conversation history using vector embeddings - **Custom embedding model**: Uses qwen3-embed-4b for semantic search capabilities - **Efficient message management**: Automatic cleanup of old messages based on configurable limits - **Long-term chat history storage**: Persistent storage of all bot interactions - **RAG-based context retrieval**: Smart retrieval of relevant conversation history using vector embeddings - **Custom embedding model**: Uses qwen3-embed-4b for semantic search capabilities - **Efficient message management**: Automatic cleanup of old messages based on configurable limits ## Setup ### Prerequisites - Python 3.10 or higher - [uv](https://docs.astral.sh/uv/) package manager - Embedding API key - Discord bot token ### Environment Variables Create a `.env` file or export the following variables: ```bash # Discord Bot Token export DISCORD_TOKEN=your_discord_bot_token # Embedding API Configuration export OPENAI_API_KEY=your_embedding_api_key export OPENAI_API_ENDPOINT=https://llama-embed.reeselink.com/embedding # Image Generation (optional) export IMAGE_GEN_ENDPOINT=http://toybox.reeselink.com:1234/v1 export IMAGE_EDIT_ENDPOINT=http://toybox.reeselink.com:1235/v1 # Database Configuration (optional) export CHAT_DB_PATH=chat_history.db export EMBEDDING_MODEL=qwen3-embed-4b export EMBEDDING_DIMENSION=2048 export MAX_HISTORY_MESSAGES=1000 export SIMILARITY_THRESHOLD=0.7 export TOP_K_RESULTS=5 ``` ### Installation 1. Sync dependencies with uv: ```bash uv sync ``` 2. Run the bot: ```bash uv run main.py ``` ## How It Works ### Database Structure The system uses two SQLite tables: 1. **chat_messages**: Stores message metadata - message_id, user_id, username, content, timestamp, channel_id, guild_id 2. **message_embeddings**: Stores vector embeddings for RAG - message_id, embedding (as binary blob) ### RAG Process 1. When a message is received, it's stored in the database 2. An embedding is generated using OpenAI's embedding API 3. The embedding is stored alongside the message 4. When a new message is sent to the bot: - The system searches for similar messages using vector similarity - Relevant context is retrieved and added to the prompt - The LLM generates a response with awareness of past conversations ### Configuration Options - **MAX_HISTORY_MESSAGES**: Maximum number of messages to keep (default: 1000) - **SIMILARITY_THRESHOLD**: Minimum similarity score for context retrieval (default: 0.7) - **TOP_K_RESULTS**: Number of similar messages to retrieve (default: 5) - **EMBEDDING_MODEL**: OpenAI embedding model to use (default: text-embedding-3-small) ## Usage The bot maintains conversation context automatically. When you ask a question, it will: 1. Search for similar past conversations 2. Include relevant context in the prompt 3. Generate responses that are aware of the conversation history ## File Structure ```text vibe_discord_bots/ ├── main.py # Main bot application ├── database.py # SQLite database with RAG support ├── pyproject.toml # Project dependencies (uv) ├── .env # Environment variables ├── .venv/ # Virtual environment (created by uv) └── README.md # This file ``` ## Build ### Using uv ```bash # Set environment variables export DISCORD_TOKEN=$(cat .token) export OPENAI_API_KEY=your_api_key export OPENAI_API_ENDPOINT="https://llama-cpp.reeselink.com" export IMAGE_GEN_ENDPOINT="http://toybox.reeselink.com:1234/v1" export IMAGE_EDIT_ENDPOINT="http://toybox.reeselink.com:1235/v1" # Run with uv uv run main.py ``` ### Container ```bash # Build podman build -t vibe-bot:latest . # Run export DISCORD_TOKEN=$(cat .token) podman run -e DISCORD_TOKEN localhost/vibe-bot:latest ``` ## Docs ### Open AI Chat Images ## Models ### Qwen3.5 > We recommend using the following set of sampling parameters for generation - Non-thinking mode for text tasks: temperature=1.0, top_p=1.00, top_k=20, min_p=0.0, presence_penalty=2.0, repetition_penalty=1.0 - Non-thinking mode for VL tasks: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0 - Thinking mode for text tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0 - Thinking mode for VL or precise coding (e.g. WebDev) tasks : temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0 > Please note that the support for sampling parameters varies according to inference frameworks.