Chapter 2

Installation

LocalAI can be installed in multiple ways depending on your platform and preferences.

Installation Methods

Choose the installation method that best suits your needs:

Containers ⭐ Recommended - Works on all platforms, supports Docker and Podman
macOS - Download and install the DMG application
Linux - Install on Linux using binaries
Kubernetes - Deploy LocalAI on Kubernetes clusters
Build from Source - Build LocalAI from source code

Quick Start

Recommended: Containers (Docker or Podman)

# With Docker
docker run -p 8080:8080 --name local-ai -ti localai/localai:latest

# Or with Podman
podman run -p 8080:8080 --name local-ai -ti localai/localai:latest

This will start LocalAI. The API will be available at http://localhost:8080. For images with pre-configured models, see All-in-One images.

For other platforms:

macOS: Download the DMG
Linux: See the Linux installation guide for binary installation.

For detailed instructions, see the Containers installation guide.

Containers

LocalAI supports Docker, Podman, and other OCI-compatible container engines. This guide covers the common aspects of running LocalAI in containers.

Prerequisites

Before you begin, ensure you have a container engine installed:

Install Docker (Mac, Windows, Linux)
Install Podman (Linux, macOS, Windows WSL2)

Quick Start

The fastest way to get started is with the CPU image:

docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
# Or with Podman:
podman run -p 8080:8080 --name local-ai -ti localai/localai:latest

This will:

Start LocalAI (you’ll need to install models separately)
Make the API available at http://localhost:8080

Image Types

LocalAI provides several image types to suit different needs. These images work with both Docker and Podman.

Standard Images

Standard images don’t include pre-configured models. Use these if you want to configure models manually.

CPU Image

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest

GPU Images

NVIDIA CUDA 13:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-13

NVIDIA CUDA 12:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-12

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-gpu-hipblas

Intel GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-gpu-intel

Vulkan:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

NVIDIA Jetson (L4T ARM64):

CUDA 12 (for Nvidia AGX Orin and similar platforms):

docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64

CUDA 13 (for Nvidia DGX Spark):

docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

All-in-One (AIO) Images

Recommended for beginners - These images come pre-configured with models and backends, ready to use immediately.

CPU Image

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu

GPU Images

NVIDIA CUDA 13:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-13

NVIDIA CUDA 12:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-12

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-aio-gpu-hipblas

Intel GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-aio-gpu-intel

Using Compose

For a more manageable setup, especially with persistent volumes, use Docker Compose or Podman Compose:

version: "3.9"
services:
  api:
    image: localai/localai:latest-aio-cpu
    # For GPU support, use one of:
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-13
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-11
    # image: localai/localai:latest-aio-gpu-hipblas
    # image: localai/localai:latest-aio-gpu-intel
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=false
    volumes:
      - ./models:/models:cached
    # For NVIDIA GPUs, uncomment:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

Save this as compose.yaml and run:

docker compose up -d
# Or with Podman:
podman-compose up -d

Persistent Storage

To persist models and configurations, mount a volume:

docker run -ti --name local-ai -p 8080:8080 \
  -v $PWD/models:/models \
  localai/localai:latest-aio-cpu
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 \
  -v $PWD/models:/models \
  localai/localai:latest-aio-cpu

Or use a named volume:

docker volume create localai-models
docker run -ti --name local-ai -p 8080:8080 \
  -v localai-models:/models \
  localai/localai:latest-aio-cpu
# Or with Podman:
podman volume create localai-models
podman run -ti --name local-ai -p 8080:8080 \
  -v localai-models:/models \
  localai/localai:latest-aio-cpu

What’s Included in AIO Images

All-in-One images come pre-configured with:

Text Generation: LLM models for chat and completion
Image Generation: Stable Diffusion models
Text to Speech: TTS models
Speech to Text: Whisper models
Embeddings: Vector embedding models
Function Calling: Support for OpenAI-compatible function calling

The AIO images use OpenAI-compatible model names (like gpt-4, gpt-4-vision-preview) but are backed by open-source models. See the container images documentation for the complete mapping.

Next Steps

After installation:

Access the WebUI at http://localhost:8080
Check available models: curl http://localhost:8080/v1/models
Install additional models
Try out examples

Troubleshooting

Container won’t start

Check container engine is running: docker ps or podman ps
Check port 8080 is available: netstat -an | grep 8080 (Linux/Mac)
View logs: docker logs local-ai or podman logs local-ai

GPU not detected

Ensure Docker has GPU access: docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
For Podman, see the Podman installation guide
For NVIDIA: Install NVIDIA Container Toolkit
For AMD: Ensure devices are accessible: ls -la /dev/kfd /dev/dri

Models not downloading

Check internet connection
Verify disk space: df -h
Check container logs for errors: docker logs local-ai or podman logs local-ai

macOS Installation

The easiest way to install LocalAI on macOS is using the DMG application.

Download

Download the latest DMG from GitHub releases:

Installation Steps

Download the LocalAI.dmg file from the link above
Open the downloaded DMG file
Drag the LocalAI application to your Applications folder
Launch LocalAI from your Applications folder

Known Issues

Note: The DMGs are not signed by Apple and may show as quarantined.
Workaround: See this issue for details on how to bypass the quarantine.
Fix tracking: The signing issue is being tracked in this issue.

Next Steps

After installing LocalAI, you can:

Access the WebUI at http://localhost:8080
Try it out with examples
Learn about available models
Customize your configuration

Docker Installation

See Containers for the complete guide to running LocalAI with Docker and Podman.

Linux Installation

Manual Installation

Download Binary

You can manually download the appropriate binary for your system from the releases page:

Go to GitHub Releases
Download the binary for your architecture (amd64, arm64, etc.)
Make it executable:

chmod +x local-ai-*

Run LocalAI:

./local-ai-*

System Requirements

Hardware requirements vary based on:

Model size
Quantization method
Backend used

For performance benchmarks with different backends like llama.cpp, visit this link.

Configuration

After installation, you can:

Access the WebUI at http://localhost:8080
Configure models in the models directory
Customize settings via environment variables or config files

Next Steps

Run with Kubernetes

For installing LocalAI in Kubernetes, the deployment file from the examples can be used and customized as preferred:

kubectl apply -f https://raw.githubusercontent.com/mudler/LocalAI-examples/refs/heads/main/kubernetes/deployment.yaml

For Nvidia GPUs:

kubectl apply -f https://raw.githubusercontent.com/mudler/LocalAI-examples/refs/heads/main/kubernetes/deployment-nvidia.yaml

Alternatively, the helm chart can be used as well:

helm repo add go-skynet https://go-skynet.github.io/helm-charts/
helm repo update
helm show values go-skynet/local-ai > values.yaml


helm install local-ai go-skynet/local-ai -f values.yaml

Build LocalAI

Build

LocalAI can be built as a container image or as a single, portable binary. Note that some model architectures might require Python libraries, which are not included in the binary.

LocalAI’s extensible architecture allows you to add your own backends, which can be written in any language, and as such the container images contains also the Python dependencies to run all the available backends (for example, in order to run backends like Diffusers that allows to generate images and videos from text).

This section contains instructions on how to build LocalAI from source.

Build LocalAI locally

Requirements

In order to build LocalAI locally, you need the following requirements:

Golang >= 1.21
GCC
GRPC

To install the dependencies follow the instructions below:

Install xcode from the App Store

brew install go protobuf protoc-gen-go protoc-gen-go-grpc wget

apt install golang make protobuf-compiler-grpc

After you have golang installed and working, you can install the required binaries for compiling the golang protobuf components via the following commands

go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af

make build

Build

To build LocalAI with make:

git clone https://github.com/go-skynet/LocalAI
cd LocalAI
make build

This should produce the binary local-ai

Container image

Requirements:

Docker or podman, or a container engine

In order to build the LocalAI container image locally you can use docker, for example:

docker build -t localai .
docker run localai

Example: Build on mac

Building on Mac (M1, M2 or M3) works, but you may need to install some prerequisites using brew.

The below has been tested by one mac user and found to work. Note that this doesn’t use Docker to run the server:

Install xcode from the Apps Store (needed for metalkit)

brew install abseil cmake go grpc protobuf wget protoc-gen-go protoc-gen-go-grpc

git clone https://github.com/go-skynet/LocalAI.git

cd LocalAI

make build

wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q2_K.gguf -O models/phi-2.Q2_K

cp -rf prompt-templates/ggml-gpt4all-j.tmpl models/phi-2.Q2_K.tmpl

./local-ai backends install llama-cpp

./local-ai --models-path=./models/ --debug=true

curl http://localhost:8080/v1/models

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "phi-2.Q2_K",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'

Troubleshooting mac

If you encounter errors regarding a missing utility metal, install Xcode from the App Store.
After the installation of Xcode, if you receive a xcrun error 'xcrun: error: unable to find utility "metal", not a developer tool or in PATH'. You might have installed the Xcode command line tools before installing Xcode, the former one is pointing to an incomplete SDK.

xcode-select --print-path

sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

If completions are slow, ensure that gpu-layers in your model yaml matches the number of layers from the model in use (or simply use a high number such as 256).
If you get a compile error: error: only virtual member functions can be marked 'final', reinstall all the necessary brew packages, clean the build, and try again.

brew reinstall go grpc protobuf wget

make clean

make build

Build backends

LocalAI have several backends available for installation in the backend gallery. The backends can be also built by source. As backends might vary from language and dependencies that they require, the documentation will provide generic guidance for few of the backends, which can be applied with some slight modifications also to the others.

Manually

Typically each backend include a Makefile which allow to package the backend.

In the LocalAI repository, for instance you can build a backend by doing:

git clone https://github.com/go-skynet/LocalAI.git

make -C LocalAI/backend/python/vllm

With Docker

Building with docker is simpler as abstracts away all the requirement, and focuses on building the final OCI images that are available in the gallery. This allows for instance also to build locally a backend and install it with LocalAI. You can refer to Backends for general guidance on how to install and develop backends.

In the LocalAI repository, you can build a backend by doing:

git clone https://github.com/go-skynet/LocalAI.git

make docker-build-<backend-name>

Note that make is only by convenience, in reality it just runs a simple docker command as:

docker build --build-arg BUILD_TYPE=$(BUILD_TYPE) --build-arg BASE_IMAGE=$(BASE_IMAGE) -t local-ai-backend:<backend-name> -f LocalAI/backend/Dockerfile.golang --build-arg BACKEND=<backend-name> .

Note:

BUILD_TYPE can be either: cublas, hipblas, sycl_f16, sycl_f32, metal.
BASE_IMAGE is tested on ubuntu:24.04 (and defaults to it) and quay.io/go-skynet/intel-oneapi-base:latest for intel/sycl

Installation

Installation Methods

Quick Start

Subsections of Installation

Containers

Prerequisites

Quick Start

Image Types

Standard Images

CPU Image

GPU Images

All-in-One (AIO) Images

CPU Image

GPU Images

Using Compose

Persistent Storage

What’s Included in AIO Images

Next Steps

Troubleshooting

Container won’t start

GPU not detected

Models not downloading

See Also

macOS Installation

Download

Installation Steps

Known Issues

Next Steps

Docker Installation

Linux Installation

Manual Installation

Download Binary

System Requirements

Configuration

Next Steps

Run with Kubernetes

Build LocalAI

Build

Build LocalAI locally

Requirements

Build

Container image

Example: Build on mac

Troubleshooting mac

Build backends

Manually

With Docker