Skip to main content
Jarvis gives you visibility into every layer of the stack — from raw node metrics to per-agent task activity. This guide covers what to watch, how to check system health, and how to get notified when something needs attention.

What metrics are available

Jarvis exposes metrics across three categories: Node health
  • CPU, memory, and disk usage per node
  • GPU utilization and VRAM consumption on inference nodes
  • Network throughput between nodes
  • Container uptime and restart counts
Model performance
  • Requests per second per model
  • Average and p95 inference latency
  • Token throughput (tokens/sec)
  • Error rates and timeout counts
Agent activity
  • Tasks submitted, in-progress, and completed
  • Task duration and delegation depth
  • Model usage per agent
  • Failed task rate

Check system health

The fastest way to get a health snapshot is the LiteLLM gateway health endpoint:
curl http://your-jarvis-host:4000/health \
  -H "Authorization: Bearer your-litellm-master-key"
For a per-model breakdown:
curl http://your-jarvis-host:4000/health/liveliness \
  -H "Authorization: Bearer your-litellm-master-key"
To check a specific node directly, SSH in and inspect container status:
ssh your-user@your-node-hostname "docker ps && docker stats --no-stream"

Health indicator checklist

Run through this checklist when you want to confirm your deployment is healthy:
  • All nodes respond to ping your-node-hostname
  • docker ps on each node shows all expected containers in Up state
  • /health endpoint on the LiteLLM gateway returns 200 OK
  • At least one model returns a successful inference via /models
  • GPU nodes show non-zero GPU utilization when a model is loaded
  • n8n is accessible and workflows show recent successful runs
  • No containers show restart counts above zero in docker ps

Alerting and notifications via n8n

Jarvis uses n8n to run monitoring workflows that check node and service health on a schedule and send alerts when something is wrong.
1

Open n8n

Navigate to your n8n instance in a browser:
http://your-jarvis-host:5678
2

Find the monitoring workflows

Look for workflows with names like “Node Health Check”, “Model Latency Monitor”, or “Agent Error Alert”. These run on a cron schedule and check the health endpoints described above.
3

Configure notification channels

Each monitoring workflow ends with a notification node. Edit it to set your preferred destination — for example, a Slack webhook, a Telegram bot, or an email address.Replace the placeholder credentials in the node with your own, then save the workflow.
4

Test the alert

Trigger the workflow manually using the Execute Workflow button to confirm the notification arrives. Then enable the workflow to run on its scheduled interval.
You can create custom monitoring workflows in n8n by combining the HTTP Request node (to poll health endpoints) with any notification node. See the n8n integration guide for more detail.

Common issues and how to address them

If a node doesn’t respond to ping or SSH:
  1. Check whether the machine is powered on
  2. Verify it’s connected to your LAN
  3. If it’s reachable but SSH is down, try rebooting via your hypervisor or management interface
  4. Once you’re in, run docker ps to check whether services restarted automatically
Jarvis does not automatically reroute traffic away from an offline node. Update your LiteLLM router config to temporarily remove the node if you need inference to continue uninterrupted.
High latency usually points to resource contention:
  1. SSH into the GPU node and run nvidia-smi — check VRAM usage and GPU utilization
  2. If VRAM is saturated, you may have too many models loaded simultaneously. Unload unused models via Ollama
  3. Check CPU usage with docker stats — a non-GPU node doing inference will be slow
  4. Review LiteLLM logs for timeout errors: docker logs litellm --tail 50
For failed or hanging tasks:
  1. Check the agent container logs: docker logs paperclip --tail 50 or docker logs hermes --tail 50
  2. Confirm the model the agent is using is responding — test it directly via the /models endpoint
  3. If the task is hanging, restart the agent container: docker restart paperclip
  4. For persistent failures, check whether the MCP tool server is reachable if the task involves tool use
Docker images, model weights, and logs accumulate over time. To reclaim space:
# Remove unused images
docker image prune -a

# Remove stopped containers
docker container prune

# Remove unused volumes
docker volume prune
Model weights stored by Ollama live outside Docker volumes — check /usr/share/ollama or your configured Ollama data directory.
If monitoring or automation workflows stop running:
  1. Open n8n and check the Executions tab for error messages
  2. Confirm the n8n container is running: docker ps | grep n8n
  3. Re-enter any expired credentials (API keys, webhook tokens) in the Credentials section
  4. For webhook-triggered workflows, verify the webhook URL is still reachable from the triggering service

Next steps

n8n workflows

Build custom monitoring and alerting workflows with n8n.

Docker operations

Restart services and inspect logs when health checks fail.

Mesh nodes

Understand the role of each node and what to expect from each.

API reference

Query health and metrics programmatically via the REST API.