I Built an AI-Powered Server Monitor

As a data engineer, I SSH into servers constantly. Checking CPU load, watching memory usage, tailing logs, running htop in one terminal while grepping pipeline output in another. It works, but it's fragmented. I wanted a single dashboard I could open in a browser and immediately know what's happening on the machine. Not a heavyweight tool like Datadog or New Relic. Something I could install in one command and run alongside the services I'm already monitoring.

So I built ServerMind.

What I Wanted

The idea was simple: take the most useful parts of htop, combine them with pipeline monitoring, add anomaly detection that catches problems before they page someone, and throw in an AI chat agent that actually understands the system context. Basically htop meets a lightweight Datadog, but self-hosted and zero-config.

I didn't want to install agents, configure exporters, or set up Prometheus. I wanted to clone a repo, run a script, and have a dashboard. That was the bar.

How It Works

The backend is FastAPI with psutil doing the heavy lifting. Every 5 seconds, it collects a full system snapshot: CPU usage (overall and per-core), memory, disk, swap, network I/O, load averages, and the top 20 processes with their resource consumption. All of this streams to the frontend over WebSocket. No polling. The browser gets updates the instant they're collected.

APScheduler runs three background jobs. One collects metrics every 5 seconds. Another runs anomaly detection every 15 seconds, checking if CPU, memory, or disk have crossed configurable thresholds. The third auto-triggers the simulated ETL pipelines every 90 seconds.

On the AI side, I'm using NVIDIA's free LLM API through the OpenAI-compatible SDK. When an anomaly fires an alert, the AI generates a 1-2 sentence summary explaining what's happening. There's also a floating chat panel where you can ask questions like "why is CPU high right now?" and the AI responds with full context — it sees the current metrics, all pipeline states, and recent alerts.

The frontend is React 19 with Vite, TailwindCSS v4, and Recharts. Dark theme, because this is a server monitoring tool and nobody wants to be blinded at 2am.

The htop-Style Process Table

This was the feature I cared about most. I wanted the most useful part of htop — the process list — available in the browser. Not a dumbed-down version. A proper sortable table showing the top 50 processes with PID, user, process name, CPU%, memory%, memory in MB, and status.

Each row has visual CPU and memory bars that change color based on load — green under 50%, yellow up to 80%, red above that. You can click any column header to sort ascending or descending. There's a live search box to filter by process name, user, or PID. And an auto-refresh toggle that updates every 5 seconds, with a pause button when you need to study a specific snapshot.

Above the table, there's a per-core CPU visualization — vertical bars for each core, color-coded by load. On a 10-core machine, you can instantly see if one core is pegged while others are idle. It also shows load averages, swap usage, and the top 3 CPU consumers at a glance.

It's the kind of thing where you open the dashboard, look at the process table for two seconds, and you know exactly which process is eating your resources.

One-Command Setup

I spent time making the setup experience as painless as possible. You clone the repo, run ./setup.sh, and it walks you through everything interactively. It checks for Python and Node, tells you exactly how to install them if they're missing, then asks for your preferred ports and an optional NVIDIA API key.

The API key is completely skippable. Everything works without it — you just don't get AI summaries and chat. You can always add it later by editing one line in backend/.env.

After setup, ./start.sh boots both the backend and frontend with one command. It even auto-detects if you haven't run setup yet and triggers it for you. Ctrl+C cleanly shuts down both servers. No Docker, no complex config files, no YAML. Works on any Linux or macOS box with Python and Node installed.

What I Learned

WebSocket is underrated for monitoring. The difference between polling every few seconds and getting instant pushes is night and day. The dashboard feels alive. Metrics update smoothly, pipeline status changes appear instantly, and alerts pop in the moment they're detected. Once you build with WebSocket for this kind of thing, polling feels broken.

psutil gives you everything htop has and more. Per-process CPU, memory, status, username, network counters, per-core utilization, boot time, swap — it's all there in a clean Python API. I was surprised how little code it took to replicate the htop experience.

Building for yourself eliminates ambiguity. Every feature decision was instant because I just had to ask myself what I'd want to see when SSHing into a server at midnight. No spec documents, no stakeholder alignment. Just "does this help me understand what's happening on this machine faster?"

AI chat is surprisingly useful when you feed it full context. A generic chatbot doesn't know your system. But when you pass in the current metrics, pipeline states, and recent alerts as context, it can actually correlate things. "CPU spiked to 92% around the same time the Payment Reconciliation pipeline failed" — that's the kind of insight that saves you from digging through logs manually.

What's Next

Right now everything is in-memory. Restarting the backend resets all history. I'd like to add persistent storage — probably SQLite to keep it lightweight — so you can look at metrics from the last 24 hours, not just the last 5 minutes. I'm also thinking about connecting it to real ETL pipelines instead of simulated ones. Airflow DAG status, cron job monitoring, that kind of thing.

The project is open source. If you spend time SSHing into servers and want something better than raw terminal output, give it a try. Clone it, run the setup, and you'll have a monitoring dashboard in under two minutes.

Check it out on GitHub.