Raspberry Pi AI Server Running DeepSeek R1

This guide provides a comprehensive overview of setting up and running the DeepSeek R1 large language model (LLM) on the Raspberry Pi 5B. With improvements in LLM efficiency, running them on small, low-power devices like the Raspberry Pi has become feasible. Below, we break down the steps and key considerations in detail:

1. Core Features of the Project

Offline Large Model Inference

Supports running lightweight large models like DeepSeek R1 1.5B/7B, enabling text generation, code writing, and Q&A interactions.

Fully local execution with no internet requirement, ensuring privacy protection.

Portable Hardware Design

Raspberry Pi 5B + PiSugar 3 Plus battery provides 2-3 hours of runtime with a lightweight design.

Custom 3D-printed case integrates cooling and power management.

Multi-Modal Interaction Interface

Open WebUI: Provides a ChatGPT-like web-based interface.

Command Line Interface: Allows developers to directly invoke APIs.

Local Network Sharing: Enables multi-user access via a web browser.

2. Project Architecture

[PiSugar 3 Plus Battery]  
       ↓ Power Supply  
[Raspberry Pi 5B]  
├─ Ollama Service (DeepSeek R1 Model)  
├─ Open WebUI (Frontend Interaction)  
├─ Power Management Module (Battery Monitoring / Power Saving Strategy)  
└─ Cooling Control (Temperature-Controlled Fan + Metal Case)  
       ↓ Output  
[User Device] → Browser/SSH/API Call  

3. Hardware and Software Configuration

Raspberry Pi 5B Performance

8GB RAM: Sufficient for running models with 1.5B/7B parameters (7B models require about 4-6GB of RAM).

CPU: Cortex-A76 quad-core 2.4GHz, significantly improved over Raspberry Pi 3B+/4B, enabling faster inference speeds.

32GB TF Card: Ensure proper storage space allocation.

PiSugar 3 Plus Battery

5000mAh Capacity: Powers Raspberry Pi 5B for 2-3 hours under high-load LLM inference and 6-8 hours in low-power mode.

Portability: Combined with a custom case, it creates a true "pocket-sized AI server."

System Image

Raspberry Pi OS (64-bit)

4. Implementation Steps

Step 1: Flash the System Image

Preparation

Required materials: Card reader, TF card.

Download the flashing tool: Visit the Raspberry Pi Imager official site at https://www.raspberrypi.com/software/ to download the software.

Flashing Process

Click "CHOOSE DEVICE" and select Raspberry Pi 5.

Click "CHOOSE OS" and select Raspberry Pi OS (64-bit).

Insert the microSD card into the card reader, connect it to your computer, and select "CHOOSE STORAGE".

A Portable Raspberry Pi AI Server Running DeepSeek R1

Customize OS settings:

Username & Password: Set administrator credentials for SSH login.

WiFi Credentials: Ensure the Raspberry Pi connects to the correct network.

Device Hostname: Define the Pi's network broadcast name.

Region Settings: Configure time zone and keyboard layout.

A Portable Raspberry Pi AI Server Running DeepSeek R1

Enable Remote Access: Under "Services", enable SSH and select "Password Authentication".

A Portable Raspberry Pi AI Server Running DeepSeek R1

 

Click "SAVE", then confirm OS customization settings.

Click "YES" to write the image to the storage device.

Once flashing is complete, eject the SD card and insert it into the Raspberry Pi to boot.

Step 2: Connect Raspberry Pi via SSH

After assembling the Raspberry Pi 5, PiSugar 3 Plus, cooling fan, and TF card, power on the device.

On your host machine, press Win + R, type cmd, and press Enter to open the command prompt. Then, run:

ssh pi@raspberrypi.local  
# Or specify the IP: ssh pi@192.168.x.x
# "pi" is the username, "raspberrypi" is the hostname
A Portable Raspberry Pi AI Server Running DeepSeek R1

Step 3: Install Ollama and Download the Model

Ollama is a tool designed for running and customizing large language models in local environments. It provides a simple, efficient interface for managing models, making AI deployment easier for developers and end users alike.

Install Ollama

sudo apt install curl -y
# One-click installation (automatically configures service)
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation status
sudo systemctl status ollama
# Should display "Active: active (running)"

Step 4: Launch the DeepSeek R1 Model

Run the following command to start the AI chatbot on your Raspberry Pi:

ollama run deepseek-r1:1.5b
A Portable Raspberry Pi AI Server Running DeepSeek R1

Step 5: Deploy Open WebUI

For a ChatGPT-like experience, install Open WebUI to turn your Raspberry Pi into a small AI chat server.

Set Up Python Virtual Environment

cd
mkdir webui
python3 -m venv ~/webui
source ~/webui/bin/activate

# Install Open WebUI
pip install open-webui

Verify Service Status

sudo systemctl status open-webui
# Should display "Active: active (running)"

Start the Service

open-webui serve
A Portable Raspberry Pi AI Server Running DeepSeek R1

Open a browser and navigate to http://<raspberrypi_ip>:8080 to access the web interface. If access is denied, try refreshing or switching networks.

Q&A

1. Unable to SSH using Hostname

Ensure the Raspberry Pi and host machine are on the same network.

Make sure SSH is enabled during system flashing.

Use PiSugar’s WiFi Configuration Tool to manage connections:

curl https://cdn.pisugar.com/PiSugar-wificonfig/script/install.sh | sudo bash

2. Speeding Up Ollama Model Downloads via Proxy

Find your host machine's IP (cmd → ipconfig → IPv4 Address).

Enable LAN proxy in Clash (default port: 7890).

Configure environment variables:

export https_proxy=http://127.0.0.1:7890  
export http_proxy=http://127.0.0.1:7890
export all_proxy=socks5://127.0.0.1:7890

3. Resolving Open WebUI Installation Timeout

If the access timeout occurs when pip installs Open WebUI in your area, you can use the following command to change the pip source

# Modify /etc/pip.conf
sudo nano /etc/pip.conf

[global]
index-url=https://pypi.tuna.tsinghua.edu.cn/simple

After configuration, Open WebUI should install smoothly. The process may take some time, so please be patient.


With these steps, you can turn a Raspberry Pi 5B into a portable AI server capable of running advanced language models entirely offline!

Back to blog