Before You Start

Component Basic Enthusiast
RAM
16 GB 32 GB+
GPU VRAM
8 GB 16 GB+
Storage
50 GB SSD 1 TB+ SSD
CPU
4+ Cores 8+ Cores

Before You Begin: A Warm Welcome for Beginners

Hello and welcome! If you are opening this guide, you have probably heard a lot about artificial intelligence and now want to get your hands on it yourself. But perhaps you also feel a little overwhelmed by terms like "local LLMs," "Jupyter Notebooks," and "Virtual Environments." Don't worry, that is completely normal! This tutorial was written specifically for absolute beginners like you.

Our goal is to show you how to run powerful AI models directly on your own computer. This is not only incredibly fascinating, but it also offers you full control over your data, maximum privacy, and the freedom to use AI models without an internet connection or expensive cloud services.

What to Expect: The Path to Local AI

Imagine you have a brand new toolkit and are ready to build something big. Ollama is the most important tool in this kit. It takes the complexity out of the equation and makes it easy to install, manage, and run various AI models. We will use this tool to set up an AI "playground" in Visual Studio Code (VS Code), which resembles a kind of lab book called a Jupyter Notebook.

In the following steps, we will together:

  • Install the necessary software.
  • Set up your development environment so that everything is clean and organized.
  • Download your first AI model and get it running with a few lines of code.

You don't need to be a programming expert to get started. We will guide you through every single step and provide you with all the code snippets you need.

A Quick Look at the Hardware

You saw a table with hardware recommendations in the previous overview. It is important to understand that the performance of AI models heavily depends on your Random Access Memory (RAM) and, if you have one, on your Graphics Processing Unit (GPU).

  • RAM: This is your computer's short-term memory. The larger the AI model, the more RAM is needed to run it.
  • GPU: An NVIDIA graphics card can massively accelerate computational power. If you have one, the AI experience will be significantly faster. But even without a powerful GPU, you can get started it will just take a bit longer.

See these recommendations as a guide, not as rigid rules. Even with a basic setup, you can experiment with smaller, yet impressive, models and learn the fundamentals.

The Most Important Thing: Be Patient and Be Curious!

This is a learning process. It's possible you will encounter an error message or that something doesn't work right away. Don't worry! This is why we have created a special Troubleshooting section that provides solutions for the most common problems. The greatest reward awaits those who remain curious and enjoy experimenting.

Installation Process

  1. Install VS Code extensions: Python and Jupyter (Ctrl+Shift+X in VS Code).
  2. Create a virtual Python environment: Open folder in VS Code, use Python: Create Environment (Ctrl+Shift+P), select Venv.
  3. Install Jupyter kernel: Open terminal, run pip install ipykernel in your virtual environment.

Start Ollama after installation and check if the service is running. Make sure Python and CUDA are set up correctly for optimal AI model performance. Configure your environment variables and test the installation.

Testing and Experimentation

Compare model requirements:

  • gemma3:1b – 815 MB, 4 GB RAM, Multimodal/Chat
  • mistral:7b – 4.1 GB, 8 GB RAM, Fast Chat
  • llama2:7b – 3.8 GB, 8 GB RAM, General Chat
  • codellama:7b – 3.8 GB, 8 GB RAM, Code Generation
  • deepseek-coder:33b – 18 GB, 22 GB RAM, Complex Coding
  • llama3.1:70b – 40 GB, 48 GB RAM, Advanced Reasoning
  1. Start Ollama server in a separate terminal: ollama serve
  2. Download a model: ollama pull gemma3:1b
  3. Run in Jupyter Notebook (VS Code):
  4.     # Ensure Ollama server is running!
        model_name = 'gemma3:1b'
    
        try:
            ollama.list()
            print("Ollama Server reachable.")
    
            response = ollama.chat(
                model=model_name,
                messages=[
                    {
                        'role': 'user',
                        'content': 'Why is the sky blue? Explain simply.',
                    },
                ]
            )
            print("\nFull response:")
            print(response)
            # Robust: Zeige die Antwort, falls das Feld existiert
            if 'message' in response and isinstance(response['message'], dict) and 'content' in response['message']:
                print("\nModel response:")
                print(response['message']['content'])
            else:
                print("\nNo valid model response found in 'message' field.")
        except Exception as e:
            print(f"\nError: {e}")
            print("Make sure 'ollama serve' is running in a separate terminal.")
    

Common Issues and Solutions

  • ModuleNotFoundError: No module named 'ollama'
    Install ollama in your active virtual environment: pip install ollama.
  • Slow performance or high CPU usage
    Try smaller models (e.g., gemma3:1b). For NVIDIA GPUs, ensure CUDA Toolkit is installed.
  • Connection to Ollama server failed
    Make sure ollama serve is running in a separate terminal. Check firewall settings for port 11434.
  • CUDA conflicts or GPU not detected
    Update NVIDIA drivers, check CUDA version with nvidia-smi, and restart your system if needed.

Refer to documentation or forums for further help.