Running a Free, Private AI Assistant — Gemma 4 + SearXNG + OpenClaw Setup Guide

Overview

Google just released Gemma 4, a family of open source models closely related to the Gemini 3 paid service and the Nano Banana image generation system. When combined with SearXNG for private web search and OpenClaw for agentic orchestration, you get a fully self-hosted AI assistant that rivals cloud offerings — completely free and with zero data leaving your machine.

This post walks through the full setup: which Gemma 4 model to pick, how to run SearXNG locally, and how to wire everything into OpenClaw for an agentic AI workflow with web search capabilities.

The Gemma 4 Model Family

Google released four new open source models under the Gemma 4 umbrella. They split into two tiers based on size and modality support:

Small Models (Mobile-Capable)

Model	Parameters	Modalities	Target Hardware
E2B	~2B	Text, Image, Video, Audio	Mobile phones
E4B	~4B	Text, Image, Video, Audio	Mobile phones

Large Models (Desktop/Server)

Model	Parameters	Modalities	Target Hardware
26B	~26B	Text, Image	Desktop GPU, server
31B	~31B	Text, Image	Desktop GPU, server

The smaller E2B and E4B models are remarkable for their multimodal breadth — text, image, video, and audio processing in a package small enough for a phone. The larger 26B and 31B models trade audio/video support for deeper reasoning on text and image tasks.

For OpenClaw’s agentic tool-calling workflow, the E4B model stands out. Despite its small size, it handles structured function calls and multi-step reasoning with surprising competence. If you have the VRAM for the 26B or 31B, those will give better results on complex reasoning, but E4B is the sweet spot for most setups.

Architecture: How the Pieces Fit Together

graph TD
    User["User Query"] --> OpenClaw["OpenClaw <br/> Agentic Orchestrator"]
    OpenClaw --> Gemma["Gemma 4 Model <br/> Local Inference"]
    OpenClaw --> SearXNG["SearXNG <br/> Private Web Search"]
    Gemma --> ToolCall["Tool Call Decisions"]
    ToolCall --> SearXNG
    SearXNG --> Results["Search Results <br/> No Data Leaves Machine"]
    Results --> Gemma
    Gemma --> Response["Final Response"]
    Response --> User

    style OpenClaw fill:#4a9eff,stroke:#333,color:#fff
    style Gemma fill:#34a853,stroke:#333,color:#fff
    style SearXNG fill:#ff6d3a,stroke:#333,color:#fff

The flow is straightforward:

User sends a query to OpenClaw
OpenClaw routes the query to the Gemma 4 model running locally
Gemma 4 decides whether it needs web search and issues tool calls to SearXNG
SearXNG executes the search entirely locally — scraping results from search engines without sending your query to any third-party API
Results feed back into Gemma 4 for synthesis
The final response returns to the user

At no point does your data leave your machine. SearXNG acts as a meta-search engine proxy, and Gemma 4 runs entirely on local hardware.

Step 1: Install and Run a Local Gemma 4 Model

You need a local inference server. The most common options are Ollama and llama.cpp. Ollama is simpler to set up:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull the E4B model (recommended for most setups)
ollama pull gemma4:e4b

# Or pull the 27B model if you have sufficient VRAM (16GB+)
ollama pull gemma4:27b

# Verify it's running
ollama list

Ollama exposes an OpenAI-compatible API at http://localhost:11434 by default. OpenClaw can connect to this directly.

VRAM Requirements

Model	Quantization	Minimum VRAM
E2B	Q4_K_M	~2 GB
E4B	Q4_K_M	~3 GB
26B	Q4_K_M	~16 GB
31B	Q4_K_M	~20 GB

For Apple Silicon Macs, unified memory counts as VRAM. A 16GB M-series Mac can comfortably run E4B and potentially the 26B model with aggressive quantization.

Step 2: Set Up SearXNG for Private Search

SearXNG is a free, open source meta-search engine. It aggregates results from Google, Bing, DuckDuckGo, and dozens of other engines without ever sharing your queries with those services directly in a trackable way.

The easiest deployment method is Docker:

# Clone the SearXNG Docker setup
git clone https://github.com/searxng/searxng-docker.git
cd searxng-docker

# Edit the .env file to set your hostname
# For local-only use, localhost is fine
cp .env.example .env

# Start SearXNG
docker compose up -d

SearXNG will be available at http://localhost:8080. You can verify it works by opening it in a browser and running a test search.

Key SearXNG Configuration

Edit searxng/settings.yml to enable the JSON API, which OpenClaw needs:

server:
  secret_key: "your-random-secret-key"
  limiter: false  # Disable rate limiting for local use

search:
  formats:
    - html
    - json  # Required for API access

Restart the container after editing:

docker compose restart

Step 3: Wire Everything into OpenClaw

OpenClaw is an agentic framework that connects local LLMs with tools. Configure it to use your local Gemma 4 instance and SearXNG:

# openclaw config
llm:
  provider: ollama
  model: gemma4:e4b
  base_url: http://localhost:11434

tools:
  web_search:
    provider: searxng
    base_url: http://localhost:8080
    format: json
    categories:
      - general
      - news
      - science

Once configured, launch OpenClaw and you have a fully functional AI assistant with web search — entirely self-hosted.

Performance Observations

After running this setup, a few things stand out:

E4B Tool Calling is Surprisingly Good. For a 4B parameter model, E4B handles agentic workflows well. It correctly decides when to search, formulates reasonable queries, and synthesizes results coherently. It is not at the level of GPT-4o or Claude for complex multi-step reasoning, but for a free, private, local model, the quality is impressive.

SearXNG Latency is Acceptable. Search queries typically return in 1-3 seconds. The bottleneck is usually the LLM inference, not the search.

Privacy is Genuine. Running tcpdump during a session confirms that no query data is sent to external AI APIs. SearXNG does make outbound requests to search engines, but these are standard web requests without persistent identifiers tied to your queries.

The 26B/31B Models Are Noticeably Better for complex reasoning tasks, but the E4B model is the right default for most people. The jump from E4B to 26B requires significantly more hardware but doesn’t always produce proportionally better results for straightforward Q&A with search.

When to Use This vs. Cloud AI

This setup is ideal when:

Privacy is non-negotiable — legal, medical, or financial queries you don’t want logged by any third party
You want zero recurring costs — no API fees, no subscriptions
You’re on a restricted network — environments where cloud AI services are blocked
You enjoy self-hosting — the tinkering is part of the appeal

Stick with cloud AI when:

You need state-of-the-art reasoning on complex tasks
You’re working with very long documents that exceed local model context windows
Uptime and reliability matter more than privacy

Conclusion

The Gemma 4 + SearXNG + OpenClaw stack represents a meaningful milestone for self-hosted AI. A year ago, running a capable agentic AI assistant with web search locally would have required expensive hardware and produced mediocre results. Today, a laptop with 8GB of RAM can run E4B with SearXNG and get genuinely useful results — for free, with complete privacy.

The setup takes about 15 minutes if you already have Docker and a package manager. For anyone who has been waiting for local AI to reach a practical threshold, this combination is worth trying.