🤔 What is a Large Language Model (LLM)?

A Large Language Model is an artificial intelligence system trained on vast amounts of text data to understand and generate human-like text. Think of it as a sophisticated pattern recognition system that has "read" billions of pages of text and learned the statistical relationships between words and concepts.

Key Concepts:

  • Neural Networks: LLMs are built on artificial neural networks inspired by the human brain, consisting of billions of interconnected nodes (parameters)
  • Training: Models learn by processing massive datasets, adjusting their internal parameters to predict the next word in a sequence
  • Inference: When you ask a question, the model uses its learned patterns to generate a relevant response word by word
🔒 Why Run AI Locally?

Privacy: Your data never leaves your computer - no cloud servers, no logging
Cost: No API fees or subscription costs after initial setup
Customization: Full control over model behavior and system prompts
Offline: Works without internet connection
Learning: Understand how AI systems actually work

📊 Model Sizes and Parameters

The "size" of a model refers to the number of parameters (weights) it contains:

Model SizeParametersRAM RequiredExample Models
Tiny1-2B4-8 GBtinyllama:1.1b, gemma:2b
Small3-7B8-16 GBphi3:mini, llama3.2:3b, mistral:7b
Medium13-34B24-48 GBllama2:13b, codellama:34b
Large70B+80+ GBllama3.1:70b, mixtral:8x7b

More parameters ≠ Always better: Smaller, well-trained models can outperform larger ones for specific tasks.

⚙️ How Does Text Generation Work?
  1. Tokenization: Your input text is split into "tokens" (roughly words or word parts)
  2. Embedding: Each token is converted into a numerical vector
  3. Processing: These vectors flow through multiple neural network layers
  4. Prediction: The model calculates probability scores for the next token
  5. Sampling: A token is selected based on these probabilities (with some randomness for creativity)
  6. Repeat: Steps 3-5 continue until the response is complete

Example:

Input: "The capital of France is"
Model thinks: "Paris" (95%), "Lyon" (2%), "Marseille" (1%)...
Output: "Paris"
🎯 Temperature and Sampling

Temperature controls the randomness of responses:

  • Temperature 0.0: Deterministic, always picks the most likely token (good for factual answers)
  • Temperature 0.7: Balanced creativity and coherence (default for most tasks)
  • Temperature 1.0+: More creative/random (good for storytelling, brainstorming)

Top-K & Top-P Sampling: Additional techniques to control output quality by limiting which tokens can be selected.

🏋️ CPU vs GPU Processing

Why GPUs are faster:

  • Parallel Processing: GPUs have thousands of cores that can process many calculations simultaneously
  • Matrix Operations: AI models require massive matrix multiplications, which GPUs excel at
  • VRAM: GPU memory is faster than system RAM for neural network operations

CPU Processing:

  • Uses system RAM
  • Processes calculations sequentially (or with limited parallelism)
  • Works perfectly fine but slower (seconds vs milliseconds per token)

Practical Impact:

  • CPU: 3-10 tokens/second (small models)
  • GPU (8GB): 20-50 tokens/second
  • GPU (16GB+): 50-100+ tokens/second
🎓 Common Misconceptions

"AI understands like humans do"
✅ AI recognizes patterns in text but doesn't "understand" meaning in a human sense

"Bigger models are always better"
✅ Smaller specialized models often perform better for specific tasks

"AI needs GPU to work"
✅ CPU-only operation is perfectly viable, just slower

"AI is always accurate"
✅ Models can "hallucinate" (generate plausible-sounding but incorrect information)

Ready to get started? Check the next tabs for installation and setup instructions!

Before You Begin

This guide provides an introduction to running AI models locally on your computer. You'll learn how to work with Large Language Models (LLMs) using tools like Ollama, Jupyter Notebooks, and Python virtual environments.

Running AI models locally offers several advantages: full control over your data, enhanced privacy, no dependency on internet connectivity, and no recurring cloud service costs. This approach is particularly valuable for academic work, research projects, and learning the fundamentals of AI implementation.

Component Minimal Basic Enthusiast
RAM
4-8 GB 16 GB 32 GB+
GPU VRAM
Optional / CPU only 8 GB 16 GB+
Storage
10 GB free 50 GB SSD 1 TB+ SSD
CPU
2+ Cores 4+ Cores 8+ Cores
Suitable Models
tinyllama:1.1b
gemma:2b
phi3:mini
llama3.2:3b
mistral:7b
codellama:7b
llama3.1:70b
mixtral:8x7b
command-r:35b

Ollama serves as the primary interface for managing and running AI models locally. It simplifies the process of installing, configuring, and executing various language models. This guide demonstrates how to configure a development environment using Visual Studio Code (VS Code) and Jupyter Notebooks for interactive AI experimentation.

The following sections cover:

  • Software installation and configuration
  • Development environment setup with proper isolation using virtual environments
  • Initial model deployment and execution

Each step includes detailed instructions and code examples. Prior programming experience is helpful but not required, as all necessary commands and configurations are provided.

A Quick Look at the Hardware

The hardware requirements table provides recommended specifications for different use cases. Performance of AI models is primarily determined by available Random Access Memory (RAM) and, optionally, Graphics Processing Unit (GPU) capabilities.

  • RAM: The system's primary memory allocation directly affects which model sizes can be loaded and executed. Larger models require proportionally more RAM.
  • GPU: NVIDIA graphics cards with CUDA support can significantly accelerate inference times. GPU acceleration is optional and provides performance benefits but is not required for basic operation.
  • CPU-Only Operation: GPU hardware is not mandatory. Ollama functions on CPU-only systems with standard configurations. Smaller models (e.g., tinyllama:1.1b, phi3:mini, gemma:2b) operate efficiently on systems with 4-8GB RAM. Response generation is slower compared to GPU-accelerated setups but remains practical for learning and development purposes.

These specifications serve as guidelines rather than strict requirements. Entry-level hardware configurations are sufficient for experimentation with compact models and learning fundamental concepts. CUDA installation is only necessary for leveraging NVIDIA GPU acceleration and can be omitted for CPU-based workflows.

🚀 Quick Start with A1-Terminal

Want to skip manual setup? The A1-Terminal project includes automatic installation scripts that handle everything for you!

The installation scripts automatically install:

  • Python 3.11+ (if not already present)
  • Ollama service and API
  • All required Python packages (customtkinter, ollama, PyYAML, requests, pyperclip)
  • Test model (tinyllama:1.1b) to get started immediately
  • ⚠️ CUDA must be installed manually (only needed for NVIDIA GPU acceleration)

Perfect for: Beginners who want a working setup immediately, or anyone who prefers using a modern GUI instead of command-line tools.

📖 See the "A1-Terminal" tab above for complete installation instructions and features.

📦 Required Software Components

🐍 Python 3.8+

  • Download from python.org
  • During installation: ✅ Check "Add Python to PATH"
  • Verify installation: python --version
  • Includes pip (Python package manager)

🦙 Ollama

  • Download from ollama.com
  • Runs as background service on localhost:11434
  • Verify: ollama --version
  • Start service: ollama serve (automatic on Windows/macOS)

💻 Visual Studio Code

  • Download from code.visualstudio.com
  • Required Extensions (Ctrl+Shift+X):
  • Python (ms-python.python) - Python IntelliSense & debugging
  • Jupyter (ms-toolsai.jupyter) - Interactive notebooks

CUDA Toolkit Optional

  • Download from NVIDIA CUDA Downloads
  • Required only for GPU acceleration with NVIDIA graphics cards
  • Significantly improves inference speed
  • Skip if using CPU-only or AMD GPUs

🔧 Development Environment Setup
Step 1: Install VS Code Extensions
  1. Open VS Code
  2. Press Ctrl+Shift+X (Extensions)
  3. Search and install: Python and Jupyter
Step 2: Create Virtual Environment
  1. Open your project folder in VS Code
  2. Press Ctrl+Shift+P (Command Palette)
  3. Type: Python: Create Environment
  4. Select Venv
  5. Choose your Python interpreter

Why virtual environments? Isolates project dependencies, prevents version conflicts, and keeps your system Python clean.

Step 3: Install Jupyter Kernel

Open integrated terminal (`Ctrl+``) and run:

pip install ipykernel

This enables Jupyter notebooks to use your virtual environment.

Step 4: Verify Ollama Service

Check if Ollama is running:

ollama list

If not running, start it:

ollama serve
Step 5: Download Your First Model
ollama pull tinyllama:1.1b

tinyllama:1.1b ✅
Size: 600 MB | RAM: 4 GB
Use: Quick tests, learning basics
Speed: Very fast, CPU-friendly
Size: 1.4 GB | RAM: 4-6 GB
Use: Google's efficient model
Quality: Great size/performance ratio
Size: 930 MB | RAM: 4 GB
Use: Multilingual, fast responses
Quality: Modern architecture, efficient
Size: 2.2 GB | RAM: 6 GB
Use: Advanced reasoning
Quality: Microsoft's latest compact
Size: 1 GB | RAM: 4 GB
Use: Edge devices, mobile-friendly
Quality: Lightweight applications
Size: 2.3 GB | RAM: 8 GB
Use: General chat, reasoning tasks
Quality: Microsoft, excellent value
Size: 2 GB | RAM: 8 GB
Use: Latest Meta model, versatile
Quality: Top tier for 3B class
Size: 4.1 GB | RAM: 12 GB
Use: High-quality responses
Quality: Industry standard, proven
Size: 4.7 GB | RAM: 12 GB
Use: Advanced reasoning, multilingual
Quality: State-of-the-art performance
Size: 4.7 GB | RAM: 12 GB
Use: Tool calling, extended context
Quality: 128K tokens context
Size: 5.4 GB | RAM: 14 GB
Use: Research, creative writing
Quality: Google's powerful model
Size: 1.6 GB | RAM: 6 GB
Use: Lightweight code helper
Languages: Python, JS, Java, C++
Size: 3.8 GB | RAM: 12 GB
Use: Code generation, debugging
Quality: Meta, reliable for coding
Size: 4.7 GB | RAM: 12 GB
Use: Advanced code tasks
Quality: Top coding model in 7B class
Size: 3.8 GB | RAM: 10 GB
Use: Code completion, refactoring
Quality: Specialized for development
Size: 1.7 GB | RAM: 6 GB
Use: Fast code completion
Languages: 600+ programming languages
Size: 4.6 GB | RAM: 12 GB
Use: Enterprise code generation
Quality: IBM's code specialist
🇩🇪

German Optimized

Size: 2 GB | RAM: 8 GB
Use: Multilingual, strong German
Quality: Best 3B for German tasks
Size: 4.1 GB | RAM: 12 GB
Use: Professional German output
Quality: Excellent German support
Size: 4.8 GB | RAM: 12 GB
Use: 101 languages incl. German
Quality: Multilingual specialist
Size: 5.4 GB | RAM: 14 GB
Use: High-quality German conversations
Quality: Google's powerful multilingual
💡 Quick Start: Begin with tinyllama:1.1b (included) or phi3:mini for best balance. Download models in A1-Terminal's "Models" tab or via CLI: ollama pull model-name

✅ Verification Checklist
  • ✅ Python installed and in PATH: python --version
  • ✅ Ollama service running: ollama list
  • ✅ At least one model downloaded: ollama list
  • ✅ VS Code extensions installed: Python + Jupyter
  • ✅ Virtual environment created and activated
  • ✅ Jupyter kernel installed: pip list | grep ipykernel

🎯 Next Steps: Your environment is ready! Start experimenting with Jupyter notebooks or launch A1-Terminal for a full-featured chat interface.

A1-Terminal: H-Term for AI Models

🖥️ A1-Terminal v1.0
Desktop GUI application for local AI models. Features automatic installation, session management, and complete offline privacy. Perfect for beginners and local development.
🌐 Open WebUI
Web-based interface for local AI models with Docker support, multi-user capabilities, RAG functionality, and enterprise-grade features. Ideal for advanced users and team environments.

The installation script handles everything automatically - perfect for beginners or blank systems!

Windows Installation

# 1. Clone repository
git clone https://github.com/Nr44suessauer/A1-Terminal.git
cd A1-Terminal

# 2. Run as Administrator (Right-click → "Run as Administrator")
.\scripts\install.bat

What the install.bat script does:

  • Python Installation - Checks for Python 3.11+, downloads and installs if missing
  • pip Update - Updates Python package manager to latest version
  • Python Packages - Installs all required packages from requirements.txt
  • Ollama Installation - Downloads (~500 MB) and installs Ollama service
  • Test Model - Downloads tinyllama:1.1b (~600 MB) for immediate use

After installation:

cd a1_terminal_modular
.\start.bat

Linux/macOS Installation

# 1. Clone repository
git clone https://github.com/Nr44suessauer/A1-Terminal.git
cd A1-Terminal

# 2. Make executable and run
chmod +x scripts/install.sh
./scripts/install.sh

What the install.sh script does:

  • Python Installation - Uses system package manager (apt/dnf/yum/brew)
  • pip Update - Updates pip to latest version
  • Python Packages - Installs all dependencies
  • Ollama - Downloads and configures Ollama + test model

After installation:

cd a1_terminal_modular
./start.sh

⏱️ Installation takes 5-10 minutes. Everything is automatic!


🔧 Manual Installation (Advanced)

For full control over each installation step:

Windows Manual Installation

# 1. Install Ollama manually
# Visit https://ollama.com/download

# 2. Clone repository
git clone https://github.com/Nr44suessauer/A1-Terminal.git
cd A1-Terminal/a1_terminal_modular

# 3. Install Python dependencies
pip install -r requirements.txt

# 4. Download a model (optional)
ollama pull tinyllama:1.1b

# 5. Start application
python main.py
# Or use start script:
# .\start.bat

Linux/macOS Manual Installation

# 1. Install Ollama manually
# Visit https://ollama.com/download
# Or use: curl -fsSL https://ollama.com/install.sh | sh

# 2. Clone repository
git clone https://github.com/Nr44suessauer/A1-Terminal.git
cd A1-Terminal/a1_terminal_modular

# 3. Install Python dependencies
pip install -r requirements.txt

# 4. Download a model (optional)
ollama pull tinyllama:1.1b

# 5. Start application
python3 main.py
# Or use start script:
# ./start.sh

After installation, tinyllama:1.1b is ready. Browse models by category:

Ultra-Lightweight Models (1-2B)

Perfect for quick tests, learning, and low-resource systems. Runs smoothly on any CPU with 4-6 GB RAM.

tinyllama:1.1b ✅
Size: 600 MB | RAM: 4 GB
Use Case: Quick tests, learning basics, simple conversations
Speed: Very fast, CPU-friendly, instant responses
Status: Already installed and ready to use!
Size: 1.4 GB | RAM: 4-6 GB
Use Case: General purpose, chat, Q&A
Quality: Google's efficient model, great size/performance ratio
Best For: Upgrading from tinyllama while staying lightweight
Size: 930 MB | RAM: 4 GB
Use Case: Multilingual tasks, fast responses
Quality: Modern architecture, efficient and capable
Best For: Users needing multilingual support in tiny size
Size: 2.2 GB | RAM: 6 GB
Use Case: Advanced reasoning, multilingual support
Best For: Microsoft's latest compact model, great performance
Size: 1 GB | RAM: 4 GB
Use Case: Efficient inference, mobile-friendly
Best For: Edge devices, lightweight applications

Balanced Models (3-7B)

Best quality-to-size ratio. Great for most tasks with 8-12 GB RAM. Industry standard performance.

Size: 2.3 GB | RAM: 8 GB
Use Case: General chat, reasoning tasks, analysis
Quality: Microsoft model, excellent value for size
Best For: Most users upgrading from lightweight models
Size: 2 GB | RAM: 8 GB
Use Case: Latest Meta model, versatile for all tasks
Quality: Top tier performance in 3B class
Best For: Users wanting cutting-edge capabilities
Size: 4.1 GB | RAM: 12 GB
Use Case: High-quality responses, complex tasks
Quality: Industry standard, proven in production
Best For: Professional use, detailed analysis
Size: 4.7 GB | RAM: 12 GB
Use Case: Advanced reasoning, multilingual support
Quality: State-of-the-art performance, highly capable
Best For: Complex reasoning and multi-language projects
Size: 4.7 GB | RAM: 12 GB
Use Case: General purpose, tool calling, function use
Best For: Meta's latest with extended context (128K tokens)
Size: 5.4 GB | RAM: 14 GB
Use Case: Research, creative writing, detailed analysis
Best For: Google's powerful open model with high quality output

Code Specialist Models

Optimized for programming tasks: code generation, debugging, completion, and refactoring.

Size: 1.6 GB | RAM: 6 GB
Languages: Python, JavaScript, Java, C++, and more
Use Case: Lightweight code helper, quick snippets
Best For: Learning to code, simple automation scripts
Size: 3.8 GB | RAM: 12 GB
Languages: Python, Java, C++, JS, and more
Use Case: Code generation, debugging, documentation
Best For: Professional development, Meta's reliable coding model
Size: 4.7 GB | RAM: 12 GB
Languages: 92+ programming languages
Use Case: Advanced code tasks, architecture design
Best For: Top coding model in 7B class, complex projects
Size: 3.8 GB | RAM: 10 GB
Languages: Specialized for popular languages
Use Case: Code completion, refactoring, optimization
Best For: Development workflows, IDE integration
Size: 1.7 GB | RAM: 6 GB
Languages: 600+ programming languages
Use Case: Code completion, faster lightweight coding
Best For: Lightweight code assistant, quick suggestions
Size: 4.6 GB | RAM: 12 GB
Languages: 116 programming languages
Use Case: Enterprise coding, bug fixing, code explanation
Best For: IBM's enterprise-grade coding model

German Language Optimized

Models with excellent German language support for professional and casual German text generation.

Size: 2 GB | RAM: 8 GB
Languages: Multilingual with strong German support
Use Case: German chat, Q&A, content creation
Best For: Best 3B model for German language tasks
Size: 4.1 GB | RAM: 12 GB
Languages: Excellent German language capabilities
Use Case: Professional German text, business communication
Best For: High-quality German output, formal writing
Size: 4.8 GB | RAM: 12 GB
Languages: 101 languages including German
Use Case: Multilingual projects, German + other languages
Best For: Specialized multilingual model with German focus
Size: 5.4 GB | RAM: 14 GB
Languages: Strong German language capabilities
Use Case: High-quality German content, formal writing
Best For: Professional German text with Google quality

System & Tools

Specialized models for system administration, CLI workflows, scripting, and technical problem-solving.

Size: 4.1 GB | RAM: 12 GB
Use Case: Command generation, CLI assistance, system tasks
Best For: Terminal workflows, Bash/PowerShell scripting
Size: 7 GB | RAM: 16 GB
Use Case: Complex system analysis, advanced troubleshooting
Best For: DevOps, infrastructure management, logs analysis
Size: 6.4 GB | RAM: 14 GB
Use Case: Function calling, structured outputs, tool use
Best For: API integration, automation scripts, agents
Size: 4.1 GB | RAM: 12 GB
Use Case: Technical support, IT troubleshooting
Best For: Fast technical assistance, problem diagnosis
💡 Quick Start: Begin with tinyllama:1.1b (included) or phi3:mini for best balance. Download models in A1-Terminal's "Models" tab or via CLI: ollama pull model-name

Project Structure
A1-Terminal/
 scripts/
    install.bat          # Windows auto-installer
    install.sh           # Linux/macOS auto-installer
 start.bat               # Quick start (from root)
 a1_terminal_modular/
    main.py            # Entry point
    start.bat          # Windows start
    requirements.txt   # Dependencies
    a1_terminal_config.yaml  # Config
    sessions/          # Saved chats
    src/
        core/
           a1_terminal.py    # Main app
           ollama_manager.py # API client
        ui/
            ultimate_ui.py     # Modern UI
            chat_bubble.py     # Messages
            session_card.py    # Session list
            model_selector.py  # Model picker
            color_wheel.py     # Color picker

Configuration

Auto-created a1_terminal_config.yaml:

# Colors
user_bg_color: "#003300"
user_text_color: "#00FF00"
ai_bg_color: "#1E3A5F"
ai_text_color: "white"

# Fonts
user_font: "Courier New"
ai_font: "Consolas"

# UI
ui_window_width: 1400
ui_window_height: 900

# Options
show_system_messages: true
auto_scroll_chat: true

Troubleshooting

Ollama Not Running:

ollama list      # Check status
ollama serve     # Start manually

App Won't Start:

pip install -r requirements.txt --upgrade
python --version  # Needs 3.8+

Model Download Failed:

  • Check internet connection
  • Try: ollama pull <model_name>
  • Check disk space
  • Verify Ollama is running

Documentation & Support

Ready to start? Run the installation script!

A1-Manual: Complete User Guide

🎯 Key Features Illustrated
📸 Visual Walkthrough

🖥️ Console Initial Prompt

Console Initial Prompt

The console provides real-time process monitoring through log outputs, showing the ideal workflow.

🎯 Console Features:
  • Process Overview: Real-time log outputs show system status and operations
  • Error Detection: Errors are displayed and deviate from the ideal process shown here
  • Model Status: Terminal displays whether the AI model is currently processing
  • System Monitoring: Complete visibility into application state and performance

🔄 Model Switching in Session

Model Change in Session

Switch between different AI models mid-conversation to compare responses and capabilities.

🎯 Key Features:
  • Multi-Session Support: Switch between multiple sessions with preserved content
  • Model Flexibility: Change models between or within sessions
  • Context Preservation: Automatic context retention when reopening sessions
  • Smart Management: Ollama handles conversation context automatically
  • Auto-Save: Sessions automatically saved after AI responses

💬 Session Management

Session Example - Astronaut
🎯 Session Features:
  • Visual Session Identification: The chat window frame is colored in the session color
  • Customizable Colors & Names: Color and name are set using the gear icon next to the session
  • BIAS Settings: The session BIAS setting is located in the bottom left; if no BIAS is set, this field is empty
  • JSON Storage: Sessions are stored in JSON format in the sessions folder, which opens when clicking the session-folder button
  • Session Preservation: All your conversations are preserved and can be restored anytime

🎭 Professional BIAS System

Professional BIAS Configuration

Configure system prompts (BIAS) to define AI behavior and personality for each session.

🎯 BIAS System Features:
  • Professional Conversations: BIAS enables specialized technical discussions with AI - response quality depends on the model and hardware used
  • Basic Queries: Fundamental questions can be easily asked and answered
  • Critical Verification: Always remain skeptical and verify the received answers - AI responses should be fact-checked
  • Session-Specific: Each session can have its own BIAS configuration for targeted conversations

⚙️ Configuration Screen

Configuration Screen

Customize your A1-Terminal experience with colors, fonts, and UI preferences in the configuration screen.

🎯 Configuration Options:
  • Console Appearance: Adjust colors, shape, and size of the console interface
  • Debug/System Outputs: Toggle debug and system message visibility on/off
  • Auto-Scroll: Enable/disable automatic scrolling to the latest message
  • Apply Changes: Clicking "Apply" triggers a restart script that closes and reopens the software automatically

📁 Session Folder Structure (JSON)

Session JSON Storage
🔒 Privacy & Data Protection:
  • Local Session Logs: Complete conversation histories for retrieval, sharing, and analysis
  • Valuable Personal Data: These datasets reveal thinking patterns and personality traits - highly valuable to platforms like OpenAI
  • Full Local Control: Your conversations remain on your machine, never sent to external servers
  • Data Sovereignty: You decide what happens with your data - share, analyze, or keep private

📥 Download and Install

Download and Install

The model download interface allows you to browse and install AI models directly from the application.

🎯 Download Features:
  • Model Browser: Browse available models categorized by size and performance
  • Progress Tracking: Real-time download progress with speed indicators
  • Space Requirements: Clear disk space and RAM requirements for each model
  • Model Folder Access: Direct access to the local model storage folder for file management
  • One-Click Install: Simple installation process directly from the interface

🎨 Color Wheel & Session Settings

Color Wheel Interface

Access this customization window by clicking the gear icon next to any session in the session list.

🎯 Session Customization Features:
  • Session Name: Set or change the session name in the text field at the top
  • Visual Color Picker: Intuitive color wheel interface for precise color selection
  • Real-Time Preview: See color changes instantly as you adjust the wheel
  • Session Theming: Customize individual session colors for easy identification
  • Persistent Settings: Both color and name choices are saved and restored when reopening sessions