alicia-ai-terminology/pages/model-types.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Model Types - Cheat Sheet</title>
  <link rel="stylesheet" href="../css/style.css">
</head>
<body>

<nav>
  <div class="nav-inner">
    <a href="../index.html" class="nav-brand">AI Cheat Sheet</a>
    <div class="nav-links">
      <a href="terminology.html">Terminology</a>
      <a href="techniques.html">Techniques</a>
      <a href="use-cases.html">Use Cases</a>
      <a href="model-types.html" class="active">Model Types</a>
      <a href="prompts.html">Prompt Guide</a>
      <a href="math.html">Math & Concepts</a>
    </div>
  </div>
</nav>

<div class="hero">
  <h1>Model Types</h1>
  <p>Architectures and families of AI models — what they are and what they do.</p>
</div>

<div class="container">

  <h2 class="section-title">Language Models</h2>
  <div class="def-card">
    <span class="category">Transformer</span>
    <h3>LLM (Large Language Model)</h3>
    <p>Neural networks based on the transformer architecture, trained on massive text corpora. They predict the next token given a sequence, enabling fluency in language tasks.</p>
    <div class="example"><strong>Examples:</strong> GPT-4, Claude, Gemini, Llama 3, Mistral, Qwen</div>
  </div>
  <div class="def-card">
    <span class="category">Transformer</span>
    <h3>Encoder-Only Models</h3>
    <p>Transformers designed to understand input (not generate text). Used for classification, sentiment analysis, and embedding generation.</p>
    <div class="example"><strong>Examples:</strong> BERT, RoBERTa, DeBERTa</div>
  </div>
  <div class="def-card">
    <span class="category">Transformer</span>
    <h3>Decoder-Only Models</h3>
    <p>Transformers designed to generate text autoregressively — the dominant architecture for modern LLMs.</p>
    <div class="example"><strong>Examples:</strong> GPT series, Claude, Llama, Mistral</div>
  </div>
  <div class="def-card">
    <span class="category">Transformer</span>
    <h3>Encoder-Decoder Models</h3>
    <p>Transformers with both encoder and decoder, used for tasks that transform input to output (translation, summarization).</p>
    <div class="example"><strong>Examples:</strong> T5, BART, Flan-T5</div>
  </div>

  <h2 class="section-title">Vision Models</h2>
  <div class="def-card">
    <span class="category">Vision</span>
    <h3>CNN (Convolutional Neural Network)</h3>
    <p>Neural networks with layers that scan images with small filters, detecting edges, textures, and patterns hierarchically. The backbone of computer vision for years.</p>
    <div class="example"><strong>Examples:</strong> ResNet, EfficientNet, VGG</div>
  </div>
  <div class="def-card">
    <span class="category">Vision</span>
    <h3>ViT (Vision Transformer)</h3>
    <p>Applying the transformer architecture to images by treating image patches as tokens. Often outperforms CNNs at scale.</p>
    <div class="example"><strong>Examples:</strong> CLIP, DINOv2, ViT-Base</div>
  </div>
  <div class="def-card">
    <span class="category">Vision</span>
    <h3>Diffusion Models</h3>
    <p>Models that generate images by iteratively denoising random noise. The architecture behind most state-of-the-art image generators.</p>
    <div class="example"><strong>Examples:</strong> Stable Diffusion, DALL-E 3, Midjourney</div>
  </div>
  <div class="def-card">
    <span class="category">Vision</span>
    <h3>Multimodal Models</h3>
    <p>Models that process multiple input types — text, images, audio — and can generate outputs across modalities.</p>
    <div class="example"><strong>Examples:</strong> GPT-4V (vision), Claude 3, Gemini, Qwen-VL</div>
  </div>

  <h2 class="section-title">Generative Models</h2>
  <div class="def-card">
    <span class="category">Generative</span>
    <h3>GAN (Generative Adversarial Network)</h3>
    <p>Two networks compete: a generator creates fake data, and a discriminator tries to detect fakes. Over time, both improve until the generator is indistinguishable from real data.</p>
    <div class="example"><strong>Example:</strong> Creating photorealistic faces that don't exist (StyleGAN).</div>
  </div>
  <div class="def-card">
    <span class="category">Generative</span>
    <h3>VQ-VAE (Vector Quantized VAE)</h3>
    <p>Combines autoencoders with discrete codebooks to learn compressed representations. Used as a foundation for autoregressive generation.</p>
    <div class="example"><strong>Example:</strong> MusicGen (music generation), SoundStream (audio compression)</div>
  </div>
  <div class="def-card">
    <span class="category">Generative</span>
    <h3>Flow Models</h3>
    <p>Models that learn a reversible transformation between data and noise, enabling exact likelihood computation and fast generation.</p>
    <div class="example"><strong>Examples:</strong> DALL-E 2 uses flow matching, Glow, RealNVP</div>
  </div>

  <h2 class="section-title">Other Architectures</h2>
  <div class="def-card">
    <span class="category">Architecture</span>
    <h3>RNN / LSTM</h3>
    <p>Recurrent networks that process sequences step-by-step, maintaining a hidden state. Largely replaced by transformers but still used in some applications.</p>
    <div class="example"><strong>Use case:</strong> Time series prediction, speech recognition</div>
  </div>
  <div class="def-card">
    <span class="category">Architecture</span>
    <h3>Mixture of Experts (MoE)</h3>
    <p>A model with multiple "expert" subnetworks. A routing mechanism selects which experts to use for each input, enabling large models that are computationally efficient at inference.</p>
    <div class="example"><strong>Examples:</strong> Mixtral 8x7B, Google's PaLM-E</div>
  </div>
  <div class="def-card">
    <span class="category">Architecture</span>
    <h3>Retrieval Models</h3>
    <p>Models designed specifically for semantic search — finding the most relevant documents for a query from a large corpus.</p>
    <div class="example"><strong>Examples:</strong> BGE, E5, Cohere embed models</div>
  </div>
  <div class="def-card">
    <span class="category">Architecture</span>
    <h3>Small Language Models (SLMs)</h3>
    <p>Compact language models (under 7B parameters) optimized for edge devices and low-latency applications. Getting remarkably capable.</p>
    <div class="example"><strong>Examples:</strong> Phi-3, Gemma 2B, Qwen 1.5B, MicroLlama</div>
  </div>

  <h2 class="section-title">Model Comparison</h2>
  <table class="glossary-table">
    <thead>
      <tr><th>Model</th><th>Type</th><th>Best For</th></tr>
    </thead>
    <tbody>
      <tr><td>GPT-4 / GPT-4o</td><td>Decoder LLM</td><td>General-purpose reasoning, coding, multimodal</td></tr>
      <tr><td>Claude 3.5</td><td>Decoder LLM</td><td>Long-context analysis, coding, writing</td></tr>
      <tr><td>Gemini 1.5 Pro</td><td>Decoder LLM</td><td>Massive context windows, multimodal</td></tr>
      <tr><td>Llama 3</td><td>Decoder LLM</td><td>Open-source, self-hosting, fine-tuning</td></tr>
      <tr><td>Mistral Large</td><td>MoE LLM</td><td>Efficient inference, multilingual</td></tr>
      <tr><td>Stable Diffusion</td><td>Diffusion</td><td>Image generation, open-source</td></tr>
      <tr><td>CLIP</td><td>Encoder (Vision+Text)</td><td>Image-text matching, embeddings</td></tr>
      <tr><td>BERT</td><td>Encoder</td><td>Text classification, search, NLU</td></tr>
      <tr><td>Whisper</td><td>Encoder-Decoder</td><td>Speech recognition, transcription</td></tr>
      <tr><td>TTS models</td><td>Decoder</td><td>Text-to-speech, voice synthesis</td></tr>
    </tbody>
  </table>

</div>

<footer>AI Cheat Sheet &mdash; A learning reference for artificial intelligence</footer>

</body>
</html>