Docker Installation

This guide provides detailed instructions for deploying Inference.net nodes using Docker. If you’re new to Docker, please refer to the Docker documentation. Join our Discord community if you need assistance.

Requirements

Windows or Linux operating system
NVIDIA GPU from our supported hardware list
Docker Desktop (Windows) or Docker Engine (Linux)
NVIDIA drivers and container toolkit

Linux Installation

Prerequisites

Install Docker Engine Follow the official Docker Engine installation guide for Linux.

Install NVIDIA Drivers Option A: Automatic installation

sudo apt update
sudo apt install ubuntu-drivers-common
sudo ubuntu-drivers autoinstall

Option B: Manual installation

sudo apt update
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
sudo apt install nvidia-driver-XXX  # Replace XXX with recommended version

Find your recommended driver version at NVIDIA’s driver download page.

Install NVIDIA Container Toolkit Follow the NVIDIA Container Toolkit installation guide.

Verify Installation

# Check Docker installation
docker --version
docker info

# Verify NVIDIA drivers
nvidia-smi

# Test NVIDIA Docker support
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

Start Node

Register an account at https://devnet.inference.net/register
Verify your email address
Navigate to the Workers tab in your dashboard
Click Create Worker in the top-right corner
Enter a worker name, ensure Docker is selected, and click Create Worker
On the Worker Details page, click Launch Worker

Run the provided Docker command with your worker code:

docker run \
  --pull=always \
  --restart=always \
  --runtime=nvidia \
  --gpus all \
  -v ~/.inference:/root/.inference \
  inferencedevnet/amd64-nvidia-inference-node:latest \
  --code <your-worker-code>

Once started, your node will enter the “Initializing” state on the dashboard. This initialization phase typically takes 1-2 minutes but may take up to 10 minutes depending on your GPU.

Windows Installation

Prerequisites

Install Docker Desktop Download and install Docker Desktop for Windows.
Install NVIDIA Drivers Download and install the appropriate NVIDIA driver for your GPU.
Install NVIDIA Container Toolkit Follow the NVIDIA Container Toolkit guide for Windows.

Start Node

Register an account at https://devnet.inference.net/register
Verify your email address
Navigate to the Workers tab in your dashboard
Click Create Worker in the top-right corner
Enter a worker name, ensure Docker is selected, and click Create Worker
On the Worker Details page, click Launch Worker

Run the provided Docker command with your worker code in PowerShell:

docker run `
  --pull=always `
  --restart=always `
  --runtime=nvidia `
  --gpus all `
  -v ~/.inference:/root/.inference `
  inferencedevnet/amd64-nvidia-inference-node:latest `
  --code <your-worker-code>

Once started, your node will enter the “Initializing” state on the dashboard. This preparation phase typically takes 1-2 minutes but may take up to 10 minutes depending on your GPU.

Advanced Configuration

Running Multiple Containers on Multi-GPU Systems

If you have multiple GPUs, you can run separate containers, each utilizing a different GPU. This maximizes your hardware utilization by running multiple workers simultaneously.

Understanding GPU Selection

GPUs are numbered starting from 0. You can specify which GPU(s) each container should use:

# Use all available GPUs
docker run --gpus all ...

# Use specific GPUs (e.g., first and second GPU)
docker run --gpus '"device=0,1"' ...

# Use a single GPU (e.g., first GPU)
docker run --gpus '"device=0"' ...

Example: Multiple GPU Setup

Run separate containers for each GPU: Container 1 (GPU 0):

docker run -d \
  --pull=always \
  --restart=always \
  --runtime=nvidia \
  --gpus '"device=0"' \
  -v ~/.inference:/root/.inference \
  --name inference-node-1 \
  inferencedevnet/amd64-nvidia-inference-node:latest \
  --code <your-worker-code>

Container 2 (GPU 1):

docker run -d \
  --pull=always \
  --restart=always \
  --runtime=nvidia \
  --gpus '"device=1"' \
  -v ~/.inference:/root/.inference \
  --name inference-node-2 \
  inferencedevnet/amd64-nvidia-inference-node:latest \
  --code <your-worker-code>

Resource Management

Set memory and CPU limits to prevent resource contention:

docker run \
  --gpus '"device=0"' \
  --memory=30g \        # Limit to 30GB RAM (minimum recommended)
  --cpus=8 \            # Limit to 8 CPU cores
  inferencedevnet/amd64-nvidia-inference-node:latest \
  --code <your-worker-code>

Complete Multi-GPU Example

Running two containers with resource limits:

# Container 1 - GPU 0
docker run -d \
  --pull=always \
  --restart=always \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --memory=30g \
  --cpus=8 \
  -v ~/.inference:/root/.inference \
  --name inference-node-1 \
  inferencedevnet/amd64-nvidia-inference-node:latest \
  --code <your-worker-code>

# Container 2 - GPU 1
docker run -d \
  --pull=always \
  --restart=always \
  --runtime=nvidia \
  --gpus '"device=1"' \
  --memory=30g \
  --cpus=8 \
  -v ~/.inference:/root/.inference \
  --name inference-node-2 \
  inferencedevnet/amd64-nvidia-inference-node:latest \
  --code <your-worker-code>

Docker Compose Configuration

For easier management of multiple containers, you can use Docker Compose. This is particularly useful when running multiple GPU instances or managing complex deployments. Create a docker-compose.yml file:

services:
  instance-0:
    image: inferencedevnet/amd64-nvidia-inference-node:latest
    command: --code <registry-code>
    restart: always
    environment:
      CONFIG_DIR: /root/.inference
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0"]
              capabilities: [gpu]
    volumes:
      - /root:/root/.inference

  instance-1:
    image: inferencedevnet/amd64-nvidia-inference-node:latest
    command: --code <registry-code>
    restart: always
    environment:
      CONFIG_DIR: /root/.inference
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["1"]
              capabilities: [gpu]
    volumes:
      - /root:/root/.inference

To deploy with Docker Compose:

# Start all services
docker-compose up -d

# View logs for all services
docker-compose logs -f

# Stop all services
docker-compose down

# Restart a specific instance
docker-compose restart instance-0

This configuration automatically handles:

GPU device assignment for each container
Persistent volume mounting for configuration
Automatic restart on failure
Proper environment variable setup

Troubleshooting

Container Startup Issues

When your Docker container fails to start or stops unexpectedly, these commands help diagnose and resolve Docker daemon-related issues. Use them to check if Docker is running properly, restart the service if needed, or examine system logs for error messages.

# Check Docker daemon status (Linux)
sudo systemctl status docker

# Restart Docker daemon (Linux)
sudo systemctl restart docker

# View Docker logs (Linux)
sudo journalctl -fu docker

GPU Access Problems

These commands help troubleshoot GPU visibility and accessibility issues within Docker containers. Use them to verify that the NVIDIA runtime is properly configured, list available GPUs on your system, or reset a GPU that may be in an error state.

# Verify NVIDIA runtime
docker info | grep nvidia

# List available GPUs
nvidia-smi -L

# Reset GPU (if needed - Linux)
sudo nvidia-smi --gpu-reset

GPU Monitoring

Monitor GPU performance and resource utilization in real-time. These commands are essential for tracking GPU memory usage, identifying bottlenecks, and ensuring your inference workloads are running efficiently. Use them to detect memory leaks or verify that your containers are properly utilizing GPU resources.

# Real-time GPU monitoring
watch -n 1 nvidia-smi

# Detailed memory usage
nvidia-smi --query-gpu=timestamp,name,memory.used,memory.total,memory.free --format=csv

# Monitor GPU processes
nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv -l 1

Container Management

Essential commands for managing Docker containers running your inference nodes. Use these to check container health, view logs for debugging, follow real-time output during operations, or monitor resource consumption to ensure containers aren’t exceeding system limits.

# Check container status
docker ps -a

# View container logs
docker logs <container_name>

# Follow logs in real-time
docker logs -f <container_name>

# Monitor resource usage
docker stats <container_name>

Handling Failures

Commands for managing failed or problematic containers. Use these when you need to stop unresponsive containers, force-terminate hung processes, remove containers for a fresh start, or clean up disk space by removing unused Docker resources.

# Stop container gracefully
docker stop <container_name>

# Force stop
docker kill <container_name>

# Remove container
docker rm <container_name>

# Clean up unused resources
docker system prune

Restart Policies

Configure automatic container restart behavior to ensure high availability of your inference nodes. These policies help maintain uptime by automatically restarting containers after crashes, system reboots, or failures, reducing the need for manual intervention.

# Always restart (including after reboot)
docker run -d --restart always ...

# Restart on failure (max 5 attempts)
docker run -d --restart on-failure:5 ...

# Restart unless manually stopped
docker run -d --restart unless-stopped ...

Performance Monitoring

Advanced monitoring commands for tracking both container and GPU performance metrics. Use these to identify performance bottlenecks, monitor power consumption for efficiency optimization, or track temperature to prevent thermal throttling during intensive inference workloads.

# Monitor container metrics
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"

# Monitor GPU temperature and power
nvidia-smi --query-gpu=temperature.gpu,power.draw,power.limit --format=csv -l 1

Tip: Always check container logs first when troubleshooting issues. They often contain valuable error messages and diagnostic information.

Epoch 3

Getting Started

Advanced

Community & Support

Requirements

Linux Installation

Prerequisites

Start Node

Windows Installation

Prerequisites

Start Node

Advanced Configuration

Running Multiple Containers on Multi-GPU Systems

Understanding GPU Selection

Example: Multiple GPU Setup

Resource Management

Complete Multi-GPU Example

Docker Compose Configuration

Troubleshooting

Container Startup Issues

GPU Access Problems

GPU Monitoring

Container Management

Handling Failures

Restart Policies

Performance Monitoring

Epoch 3

Getting Started

Advanced

Community & Support

​Requirements

​Linux Installation

​Prerequisites

​Start Node

​Windows Installation

​Prerequisites

​Start Node

​Advanced Configuration

​Running Multiple Containers on Multi-GPU Systems

​Understanding GPU Selection

​Example: Multiple GPU Setup

​Resource Management

​Complete Multi-GPU Example

​Docker Compose Configuration

​Troubleshooting

​Container Startup Issues

​GPU Access Problems

​GPU Monitoring

​Container Management

​Handling Failures

​Restart Policies

​Performance Monitoring

Requirements

Linux Installation

Prerequisites

Start Node

Windows Installation

Prerequisites

Start Node

Advanced Configuration

Running Multiple Containers on Multi-GPU Systems

Understanding GPU Selection

Example: Multiple GPU Setup

Resource Management

Complete Multi-GPU Example

Docker Compose Configuration

Troubleshooting

Container Startup Issues

GPU Access Problems

GPU Monitoring

Container Management

Handling Failures

Restart Policies

Performance Monitoring