Monitoring

Jarvis gives you visibility into every layer of the stack — from raw node metrics to per-agent task activity. This guide covers what to watch, how to check system health, and how to get notified when something needs attention.

What metrics are available

Jarvis exposes metrics across three categories: Node health

CPU, memory, and disk usage per node
GPU utilization and VRAM consumption on inference nodes
Network throughput between nodes
Container uptime and restart counts

Model performance

Requests per second per model
Average and p95 inference latency
Token throughput (tokens/sec)
Error rates and timeout counts

Agent activity

Tasks submitted, in-progress, and completed
Task duration and delegation depth
Model usage per agent
Failed task rate

Check system health

The fastest way to get a health snapshot is the LiteLLM gateway health endpoint:

curl http://your-jarvis-host:4000/health \
  -H "Authorization: Bearer your-litellm-master-key"

For a per-model breakdown:

curl http://your-jarvis-host:4000/health/liveliness \
  -H "Authorization: Bearer your-litellm-master-key"

To check a specific node directly, SSH in and inspect container status:

ssh your-user@your-node-hostname "docker ps && docker stats --no-stream"

Health indicator checklist

Run through this checklist when you want to confirm your deployment is healthy:

All nodes respond to ping your-node-hostname
docker ps on each node shows all expected containers in Up state
/health endpoint on the LiteLLM gateway returns 200 OK
At least one model returns a successful inference via /models
GPU nodes show non-zero GPU utilization when a model is loaded
n8n is accessible and workflows show recent successful runs
No containers show restart counts above zero in docker ps

Alerting and notifications via n8n

Jarvis uses n8n to run monitoring workflows that check node and service health on a schedule and send alerts when something is wrong.

Open n8n

Navigate to your n8n instance in a browser:

http://your-jarvis-host:5678

Find the monitoring workflows

Look for workflows with names like “Node Health Check”, “Model Latency Monitor”, or “Agent Error Alert”. These run on a cron schedule and check the health endpoints described above.

Configure notification channels

Each monitoring workflow ends with a notification node. Edit it to set your preferred destination — for example, a Slack webhook, a Telegram bot, or an email address.Replace the placeholder credentials in the node with your own, then save the workflow.

Test the alert

Trigger the workflow manually using the Execute Workflow button to confirm the notification arrives. Then enable the workflow to run on its scheduled interval.

You can create custom monitoring workflows in n8n by combining the HTTP Request node (to poll health endpoints) with any notification node. See the n8n integration guide for more detail.

Common issues and how to address them

A node stops responding

If a node doesn’t respond to ping or SSH:

Check whether the machine is powered on
Verify it’s connected to your LAN
If it’s reachable but SSH is down, try rebooting via your hypervisor or management interface
Once you’re in, run docker ps to check whether services restarted automatically

Jarvis does not automatically reroute traffic away from an offline node. Update your LiteLLM router config to temporarily remove the node if you need inference to continue uninterrupted.

Model inference is slow or timing out

High latency usually points to resource contention:

SSH into the GPU node and run nvidia-smi — check VRAM usage and GPU utilization
If VRAM is saturated, you may have too many models loaded simultaneously. Unload unused models via Ollama
Check CPU usage with docker stats — a non-GPU node doing inference will be slow
Review LiteLLM logs for timeout errors: docker logs litellm --tail 50

An agent task fails or hangs

For failed or hanging tasks:

Check the agent container logs: docker logs paperclip --tail 50 or docker logs hermes --tail 50
Confirm the model the agent is using is responding — test it directly via the /models endpoint
If the task is hanging, restart the agent container: docker restart paperclip
For persistent failures, check whether the MCP tool server is reachable if the task involves tool use

Disk space is running low

Docker images, model weights, and logs accumulate over time. To reclaim space:

# Remove unused images
docker image prune -a

# Remove stopped containers
docker container prune

# Remove unused volumes
docker volume prune

Model weights stored by Ollama live outside Docker volumes — check /usr/share/ollama or your configured Ollama data directory.

n8n workflows are failing

If monitoring or automation workflows stop running:

Open n8n and check the Executions tab for error messages
Confirm the n8n container is running: docker ps | grep n8n
Re-enter any expired credentials (API keys, webhook tokens) in the Credentials section
For webhook-triggered workflows, verify the webhook URL is still reachable from the triggering service

Next steps

n8n workflows

Build custom monitoring and alerting workflows with n8n.

Docker operations

Restart services and inspect logs when health checks fail.

Mesh nodes

Understand the role of each node and what to expect from each.

API reference

Query health and metrics programmatically via the REST API.

Getting Started

Operations

What metrics are available

Check system health

Health indicator checklist

Alerting and notifications via n8n

Common issues and how to address them

Next steps

n8n workflows

Docker operations

Mesh nodes

API reference

Getting Started

Operations

​What metrics are available

​Check system health

​Health indicator checklist

​Alerting and notifications via n8n

​Common issues and how to address them

​Next steps

n8n workflows

Docker operations

Mesh nodes

API reference

What metrics are available

Check system health

Health indicator checklist

Alerting and notifications via n8n

Common issues and how to address them

Next steps